Untitled

CHAPTER 9

The Language in Action: A Gallery of Programming Examples.

In this, the last chapter of the introductory part of this book, we illustrate the use of SETL by giving a variety of programs which exhibit its features and can serve as useful models ol style. Some of the smaller programs present significant algorithms; the larger examples show how more substantial programming problems and applications can be addressed.

9.1 Eulerian Paths in a Graph

A graph is a collection of nodes, pairs of which are connected by edges (see Section XXX). Graphs come in two varieties, directed graphs, each of whose edges has a specified starting node and target node, and undirected graphs, whose edges can be traversed in either direction. The most natural SETL representation of a directed graph G is a set of ordered pairs [x, y], each such pair representing an edge with starting node x and target node y. It is convenient to represent an undirected graph G in the same way, but in this case the reversed edge [y, x] belongs to G whenever [x, y] belongs to G. This representation also allows us to regard G as a multivalued map: G{x} is the set of nodes connected to x by some edge. The following algorithm makes use of this fact.

Given an undirected graph G, the Eulerian path problem, named after the famous mathematician Leonhard Euler (1707-83), "who calculated as other men breathe," is to traverse all the edges of G exactly once by a single unbroken path p which starts at some node x of the graph and ends at some other node y (which might be the same as x). We can think of such a path, called a Eulerian path, as "using up" edges as it traverses them. Euler used the following argument to determine which graphs contain paths p of this kind. If a node z along p is different from the starting and ending nodes x and y of p, then immediately after p has reached z along one edge p will leave it along some other edge, and thus p will always use up an even number of the edges which touch any node z of p not equal to x or y. The same remark applies to the starting node x if x = y, but if x and y are different then p must use up an odd number of the edges touching x and an odd number of the edges touching y. It follows that a Eulerian path p which traverses all the edges of G just once can only exist if G is connected and either has no nodes x touched by an odd number of edges or has exactly two such nodes x, y; and in this latter case every Eulerian path p must start at one of x, y and end at the other.

Suppose, conversely, that G has either no nodes or exactly two nodes which are touched by an odd number of edges. Then we can construct an Eulerian path p as follows. If every node of G is touched by an even number of edges of G, let x be any node of G; otherwise let x be one of the two nodes x, y of G touched by an odd number of edges. Start the path p at x, and extend p as long as possible by stepping from its endpoint along any edge of G that has not been traversed before. Since we consider an edge to be used up as soon as it is traversed, the construction of p uses up more and more edges of G and therefore must eventually stop. Hence p must be finite. Suppose that p ends at a node y. Clearly all the edges touching y must have been traversed by p, since otherwise p could be extended by some edge. Thus, if the starting node x of p is touched by an odd number of edges, p must end at some other node y which is also touched by an odd number of edges, whereas if x is touched by an even number of edges, then p must return to x and end there. In either case, removing all edges traversed by p from G will leave behind a graph G' each of whose nodes is touched by an even number of edges. If p does not already traverse all the edges of G, then some node z along p will be touched by some untraversed edge. In this case, one can construct a path q by starting from z with this edge and extending q along untraversed edges as long as possible. Since the remarks concerning p apply to q as well, and since q can be regarded as a path in the graph G', and since all of the nodes preceding G are touched by an even number of edges, the path q must both begin and end at z; i.e., q must be a circuit. Hence we can insert q into p, thereby constructing a path which first follows p to z, then follows q until q finally returns to z, and then follows the remainder of p to its end. Call this extended path by the same name p. Repeating the construction and insertion of circuits like q as often as possible, we must eventually build up a path p which traverses all the edges of the original graph G.

The two following procedures realize the Eulerian path construction described in the preceding paragraphs. Procedure build_path starts a new path and extends it as far as possible, deleting (from G) the edges traversed by this path; Procedure Euler_path installs the path sections returned by build_path into the overall Eulerian path that it constructs and returns.

program Euler;    -- Eulerian path construction

    graph := {[1,2], [2,3], [3,4], [4,1], [4,2]};             -- a small graph
    print(euler_path(graph + {[y, x]: [x, y] in graph}));        -- which is undirected.

    procedure Euler_path(G);                -- constructs Eulerian path for graph G

        nodes := domain(G);                        -- all nodes in the graph.
    
        if #(odds := {x in nodes | odd(#G{x})}) > 2 then
            return OM;      -- since more than two nodes are
        end if;             -- touched by an odd number of edges
    
                        -- odds is the set of all nodes of G that 
                        -- are touched by an odd number of edges
        x := arb(odds)?arb(nodes);      -- pick a node of odds if possible
                                        -- otherwise pick any node of G
    
        path := [x] + build_path(x,G);
    
        while exists z = path(i) | G{z} /= {} loop
            new_p := build_path(z, G); -- insert new section into path
            G -:= ({[y,x]: [x,y] in new_p} + {e: e in new_p});
            path := path(i..i - 1) + new_p + path(i..);
        end loop;
    
        return path;

    end Euler_path;

    procedure build_path(x, rw G);        -- builds maximal path section starting at x, 
                                          -- and deletes all edges traversed

        p := [ ];

         while (y := arb G{x}) /= OM loop 
                 -- while there exists an edge leaving the last point reached
            p with:= y;                 -- extend path to traverse the edge
            G -:= {[x, y], [y, x]};        -- delete the edge just traversed
            x := y;                        -- step to y
        end loop;

        return p;

    end build_path;

end euler;

9.2 Topological Sorting

Certain problems, of which scheduling problems are typical, require one to arrange the nodes n of a graph G in a list such that every edge of G goes from a node n₁ to a node n₂ coming later in the list. This is called the problem of topological sorting. Suppose, for example, that a student must choose the order in which he or she will take the courses required to qualify as a computer science major, some of which have other courses as prerequisites. Suppose also that we represent the prerequisite relationship as a set G of pairs, agreeing that whenever course n₁ is a prerequisite of course n2, we will put the pair [n1,n2] into G. Then, mathematically speaking, G is a graph; in heuristic terms, G{n₁} is the set of all courses for which n₁ is a prerequisite. (Note the connection of the topological sorting problem with the transitive computation of prerequisites described in Section 4.3.8.1.)

To sort a collection of courses topologically is simply to arrange them in any order in which they could actually be taken, given that all the prerequisites of each course n must be taken before n is taken. To do this we can simply find some course n which has no (unfulfilled) prerequisites, put n first in the list L, drop all edges [n, n1] from G (since n is no longer an unfulfilled prerequisite), and then continue recursively as long as courses without unfulfilled prerequisites remain. Written as a recursive SETL routine, this is short and direct:

procedure top_sort1(G,nodes);      -- topological sorting procedure, recursive form
        return if exists n in nodes | n notin range(G) then 
            [n] + top_sort1(G lessf n, nodes less n) else [] end if;
end top_sort1;

Invocation of top_sort1(G) will return a tuple t consisting of some or all of the nodes of G. If it is possible to sort nodes of G topologically, then every node of G will appear in t. This will be the case if and only if G admits no cycle of nodes such that

    n₁ is prerequisite to n₂ is prerequisite to n₃ is prerequisite to ... 
              is prerequisite to n_k is prerequisite to n₁.

To see this, note that it is clear that when such a cycle of mutually prerequisite nodes exists, no node in the cycle can ever be put into the tuple t returned by top_sort1. Conversely, if a node n₀ belongs to no such cycle, then eventually top_sort1 will have processed all the predecessors (i.e., prerequisites) of n0, and after this top_sort1 must eventually put n₀ into the tuple t it returns. This shows that the set of all nodes belonging to any cycle like (n₁,n₂,...,n_k,n₁) is simply

       nodes - {x in top_sortl(G,nodes)},

so that (1) can also be used to test a graph G for the presence of cycles.

Like many other tail recursions, i.e., recursive procedures which only call themselves immediately before returning, the topological sort procedure seen above can be rewritten as an iteration (see Section 5.4). Written in this way, the topological sort procedure becomes:

procedure top_sort2(G);         --  first iterative form of topological sort

    nodes := domain(G) + range(G);   --  Here we calculate the set of all nodes; this makes it unnecessary to
                                     --  pass the set of nodes as an additional parameter.

    t := [ ];                    --  initialize the tuple to be returned

    while exists n in nodes | n notin range(G) loop
        t with:= n;
        G lessf:= n;
        nodes less:= n;
   end loop;

   return t;

end top_sort2;

It is possible to improve the efficiency of (3) very substantially by keeping the current value of the set

            {n in nodes | n notin range G}

available at all times. To do this, we proceed as follows:

For each node n, we maintain a count of the number of the predecessors of n which have not yet been put into t.
When n is put into t, we reduce this count by 1 for all nodes nl in Gl~n}.
If count(x) falls to zero, then x becomes a member of the preceding set.

These observations, which could be derived step by step from the more general formal differencing principles discussed in Section 6.5, underlie the following revised form of (3):

procedure top_sort3(G); 
                                        -- second iterative form of the topological sorting procedure
        nodes := (domain G) + (range G);
        count := {};    -- initialize the count function described previously

        ready := nodes; -- The following loop will remove elements that have
                                -- any predecessors from ready
        for [x, y] in G loop
               count(y) := (count(y)?0) + 1;
                ready less := y; -- since y has a predecessor
        end loop;
                 -- At this point 'ready' is the set of all nodes without predecessors

        t := [ ];   -- t is the tuple being built up

        while ready /= {} loop

           n from ready;
           t with:= n;

           for n1 in G{n} loop
               if (count(n1) -:= 1) = 0 then ready with:= n1; end if;
           end loop;

        end loop;

        return t;

end top_sort3;

It is not hard to see that the preceding code examines each edge of the graph G just twice. Thus the time needed to execute this code is linearly proportional to #G.

9.3 The Stable Assignment Problem

Suppose that the members of a population of n students are applying to a collection of m colleges. We suppose also that each student finds a certain collection of colleges acceptable, and that he/she ranks these colleges in order of decreasing preference. Finally we suppose that each college c can admit only a given quota Q(c) of the students who apply to it, and that it is able to rank all the students in order of decreasing preference. We do not suppose that any of these preferences is necessarily related to any other; that is, different students can rank colleges in radically different orders, and different colleges may find quite different types of students preferable. The problem we consider is that of assigning students to colleges in such a way as to satisfy the following three conditions:

No college accepts more than Q(c) students;
A college c never admits a student s₁ if it has filled its quota Q(c) and there exists an unassigned student s₂ to whom college c is acceptable and whom college c prefers to student s₁.
There is no situation in which student s₁ is assigned to college c₁ and student s₂ is assigned to college c₂, but both the students involved and the colleges involved prefer to switch; that is, s₁ prefers c₂ to c₁, s₂ prefers c₁ to c₂, c₁ prefers s₂ to s₁, c₂ prefers s₁ to s₂.

This problem was studied by David Gale and Lloyd Shapley (American Mathematical Monthly, 1962, pp. 9-15), who gave a simple algorithm for finding an assignment satisifying conditions (a), (b), and (c). The algorithm is just this: Each student applies to his first-choice college. Then each college c puts the topmost-ranked Q(c) students who have applied to it on an active list and notifies the others that they have been rejected. All rejected students now apply to their second-choice colleges. Then all colleges rerank their applicants, keep the first Q(c) of these applicants, and again notify the others that they have been rejected. This cycle of reapplication and reranking continues until no rejected students have any more colleges on their list of acceptable colleges. It is clear that the assignment produced by this procedure satisfies condition (a). Condition (b) is also satisfied, since if s₂ finds college c acceptable, he/she will eventually apply to college c and can then bump any student s₁ whom c finds less acceptable, but will never subsequently be bumped except by a student whom c finds more acceptable. Finally, condition (c) is satisfied, since if s₁ prefers c₂ to c₁ he/she must have applied to c₂ before c₁ but been bumped from c₂'s active list by a student that c₂ prefers to s₁. But when this happened c₂'s active list could not have contained any student that c₂ does not prefer to s₁. Therefore, since the students on college c₂'s active list never grow any less attractive from c₂'s point of view, c₂ will never regard any student on its final active list as less desirable than s₂.

The Gale-Shapley iteration only continues as long as some student has just been rejected by the latest college to which they have applied. Since this rejection 'uses up' one of the items on that student's list of acceptable colleges, the nmber of iteerations required can be no more than the sum of the lengths of all these lists, plus 1. The code which follows reflects this fact.

Programmed in SETL, the Gale-Shapley algorithm is as follows.

program gale_shapley; -- Gale-Shapley assignment algorithm
    const A := "A", B := "B", CC := "CC", D := "D";  -- constants designating colleges

    stpref := {[1, [A, B, CC]], [2, [B, CC, A, D]], [3, [CC, A, B]],
                        [4, [B, A, CC]],[5, [B, A, CC, D]]};   -- students' choices
    colpref:= {[A,[1,2,3,4]],[B,[4,3,2,1]],[CC,[2,4,3]],[D,[1,2,4,5]]};
                                            -- colleges' rankings of applicants
    quot := {[A,2], [B,1], [CC,1], [D,2]}; -- size of entering classes

    print(assign(stpref, colpref, quot));        --invoke assignment algorithm
    
procedure assign(stud_pref,coll_pref,quota);         -- Gale_Shapley stable assignment algorithm
        -- stud_pref maps each student into the vector of colleges he/she finds
        -- acceptable, ranked in decreasing order of preference; coll_pref(c)(s1 s2)
        -- is true if college c finds student s1 preferable to student s2, false otherwise.
        -- The map quota gives the number of students each college will accept.
    
    colleges := domain(quota);
    active := {[c,[]]: c in colleges};           -- set up an empty active list for each college
    applicants := domain(stud_pref);             -- initialize the pool of applicants
    maxits := 1 +/ [#colls: [-,colls] in stud_pref];    -- maximum reauired iterations; see preceding  remark
    
    for j in [1..maxits] loop      -- iterate as often as needed
    
        new_applicants := applicants; -- save the set of applicants, which will be iterated over
    
        for s in applicants | stud_pref(s) /= [] loop
                -- each unsatisfied student who has a college to apply to does so
            first_choice fromb stud_pref(s);
            active(first_choice) with:= s;
            new_applicants less:= s;
        end loop;

        applicants := new_applicants; -- bring the set of applicants into its new condition
    
        for c in colleges | #active(c) > quota(c) loop     -- drop all 'over quota' applicants

            active(c) := pref_sort(active(c),coll_pref(c));     -- rerank all who have applied
            
            for k in [quota(c) + 1..#active(c)] loop
                applicants with:= active(c)(k);        -- return student to applicant pool
    
            end loop;

        active(c) := active(c)(1..#active(c) min quota(c));        -- cut back active list
    
        end loop;
    
        if not (exists s in applicants | stud_pref(c) /= []) then exit; end if;

    end loop;

    return [active, applicants];    -- pattern of assignments is complete
                -- we  return  the list of  accepted students, by colleges, 
                -- and the set of students not accepted by any college to which they applied
end assign;

procedure pref_sort(apvect,order);
        -- this returns the current group of applicants in the order of the college's choice.
    applicants := {x: x in apvect};        -- convert to set
    return [x in order | x in applicants] + [x in applicants | x notin order];        -- unranked students come last
end pref_sort;

end gale_shapley;

9.4 A Text Preparation Program

Text preparation programs aid in the preparation of printed material by arranging text in attractively indented, justified, centered, and titled paragraphs and pages. You may well have used some utility program of this type: they are commonly available under such names as Script, Runoff, Troff, etc. In this section, we will describe the internal structure of a simplified version of such a program.

Our program, which we will call Prepare, accepts source text containing embedded command lines as input and reformats the text in the manner specified by the command lines. Command lines are distinguished from text lines by the fact that the former start with a period as their first character, and by the fact that this initial character is followed by a few other characters signifying one of the allowed Prepare commands, as listed later. In its ordinary mode of operation, Prepare collects words from the text it is formatting and fills up successive lines until no additional words will fit on the line being filled. Then the line is right-justified and printed. However, commands can also be used to center a line, and lines can be terminated without being filled (we call this action a break). Text can also be arranged in several special table formats, as described later.

The Prepare program treats any unbroken sequence of nonblank characters as a word. An autoparagraphing feature, which causes every text line starting with a blank to start a new paragraph, is also available. Margins and spacing are controllable by commands. A literal command, which causes text following it to be printed exactly as it stands, is available to override the normal reformatting action of Prepare . Facilities for automatic numbering of sections and subsections are also available. If the activity of Prepare discloses incon- sistencies or errors in the commands presented to it, a file of diagnostic warnings is printed. The formatting commands supported by Prepare are listed in a table below. However, it will be easier to read these commands if you keep in mind the fact that they sense and manipulate the following variables, which are crucial to Prepare's activity:

Variable Name	Meaning
Page_honzontal	Horizontal width of paper
Page_vertical	Number of lines on page
Spacing	Current spacing of lines; 1 = single spacing
Left_margin	Current indentation for left margin
Right_margin	Current right indentation for right margin
Old margins	Saved prior values of margins
Current line	Line of output text currently being built up
Fill	Controls collection of words into current_line
Justify	Switch controlling right justification of output lines/TD>
Line_count	Counts number of lines output so far on current page
Page_number_stack	Stack of page and subpage numbers
Number_pages	Switch for page numbering
Header_number_stack	Stack of section and subsection numbers
Title	Current page title
Subtitle	Current page subtitle
Chapter number	Current chapter number

The commands supported by the Prepare system are as follows:

.BR (BREAK)	Causes a BREAK; i.e., the current line will be output with no justiflcation, and the next word of the source text will be placed at the beginning of the next line.
.S n (SKIP)	Causes a BREAK after which n is multiplied by the number of spaces between lines. The result is the number of lines skipped. Output is advanced to the top of the next page if there is no room on the current page.
.B n (BLANK LINES)	Causes the current line to be output with no justification, skips n line spaces, and then starts output of the current source text. BLANK is like SKIP, except that the space to be left is independent of line spacing.
.FG n (FIGURE)	Leaves n lines blank to make room for a figure or diagram. If fewer than n lines remain on the current page, text continues to fill this page. Then the page is advanced and n blank lines are left at the top of the next page.
.I n (INDENT)	Causes a BREAK and sets the next line to begin n spaces to the right of the left margin.
.P n, v, t (PARAGRAPH)	Causes a BREAK and formats the output paragraphs. The parameter n is optional and, if present, sets the number of spaces the paragraph is to be indented. The default value for n is 5 (n can also have a negative value). v is the vertical spacing between paragraphs. v can range from 0 to 5. (1 denotes single spacing, 2 double spacing, etc.) t causes an automatic TEST PAGE (see the TEST PAGE command).
.C n; text (CENTER)	Causes a BREAK and centers the following text in the source file. The centering is over column n/2 independent of the setting of the left and right margins. If n is not given, it is assumed to be the page width.
.NT caption (START INDENTED NOTE)	Starts an indented note. This command BLANKS 2, increases both margins by 25% of the text space, centers the caption (if no caption is given, it centers the word "NOTE"), and then BLANKS 1. At this point there follows the text of the note.
.EN (END INDENTED NOTE)	Terminates a NOTE command, BLANKS 2, and reverts the margins and spacing modes to their settings before the last NOTE command.
.PG (NEW PAGE)	Causes a BREAK and an advance to a new page. If the current page is empty, this command does not advance the page. Just like an automatic page advance, this command adds the title (if given) and page numbers on every page.
.TP n (TESTPAGE)	Causes a BREAK followed by a conditional page advance. It skips to the next page if fewer than n lines are left on the page. This feature serves to ensure that the following n lines are all output on the same page. This command is also triggered by inclusion of the letter 't' as an optional argument to the PARAGRAPH command.
.NM n (RESTART PAGE NUMBERING)	Starts page numbering. Pages are normally numbered so there is no reason to issue this command unless page numbering is disengaged. If resumption of page numbering is desired at a certain page, specify n.
.NNM (SUSPEND PAGE NUMBERING)	Disengages page numbering. However, pages continue to be counted, so that the normal page number can appear if page numbering is restarted with the NUMBER command.
.CH text (START CHAPTER)	Starts a new chapter using 'text' as the title of the chapter. This command acts as if the following command string were entered: .BREAK; .PAGE; .BLANK 12; .CENTER CHAPTER n The n is incremented by 1 automatically. After the CHAPTER n is typed on the page, .BLANK 2; .CENTER; text;.BLANK 3 occurs. This command then resets the case, margins, spacing, and justify/fill modes. It also clears any subtitles and sets the chapter name as the title.
.NC n (SET CHAPTER NUMBER)	Supplies a number (n) to be used in a subsequent CHAPTER command. NUMBER CHAPTER would be used when a chapter of a document occupies a source file of its own. In such a case, NUMBER CHAPTER would be the first command of the source file.
.T text (DEFINE TITLE)	Takes the remaining text as the title and outputs it on every page at line 0. The default is no title. If a title is desired, this command must be entered in the source file.
.FT text (DEFINE FIRST TITLE)	Same as TITLE, but used to specify the title to be printed on the first page of the document. This command must precede all text in the source file. Use of the FIRST TITLE command is the only way to print a title line on the first page of the document.
.ST text (DEFINE SUBTITLE)	Takes the remaining text as the subtitle and outputs it on every page. A subtitle appears directly under the page title. The subtitle is not indented, but indentation can be achieved by typing leading spaces.
.SP (START SUBPAGE NUMBERING)	Executes a PAGE with page numbering suspended. The page number is unchanged, but letters are appended to the page number. This permits insertion of additional pages within an existing document without changing the existing page numbering.
.ESP (END SUBPAGE NUMBERING)	Disengages the SUBPAGE command by executing a PAGE command with page numbering resumed.
.HD (SWITCH PAGE TITLING ON)	Causes the page header (title, subtitle, and page number) to be printed.
.NHD (SWITCH PAGE TITLING OFF)	Causes the page header (title, subtitle, and page number) to be omitted. The header lines are completely omitted, so that text begins at the top of the page with no top margin.
.J (SWITCH ON LINE JUSTIFICATION)	Causes a break and sets subsequent output lines to be justified (initial setting). The command increases the spaces between words until the last word exactly meets the right margin.
.NJ (SWITCH OFF LINE JUSTIFICATION)	Causes a BREAK and prevents justification of subsequent output lines, allowing a ragged right margin.
.F (SWITCH ON LINE FILLING)	Causes a break and specifies that subsequent output lines be filled. Sets the justification mode to that specified by the last appearance of JUSTIFY or NOJUSTIFY. FILL adds successive words from the source text until addition of one more word would exceed the right margin, but stops before putting in this last word.
.NF (SWITCH OFF LINE FILLING)	disengages the FILL and JUSTIFY modes. This command is used to permit typing of tables or other manually formatted text.
.LIT (PRINT FOLLOWING TEXT LITERALLY)	disengages FILL/ JUSTIFY to permit printing of text exactly as entered in source file.
.ELI (END LITERAL TEXT)	used after LITERAL command to reengage FILL/JUSTIFY.
.LM n (SET LEFT MARGIN)	sets the left margin to n. The n must be less than the right margin but not less than 0. The default setting is 0.
.RM n (SET RIGHT MARGIN)	sets the right margin n. The n must be greater than the left margin. The default setting is 60.
.PV n (SET PAGE HEIGHT)	sets the number of lines on the page to n. The default setting is 58.
.SP n (SET INTERLINE SPACING)	sets the number of spaces between lines. The n can range from 1 to 5. The default setting is 1. SPACING 1 is like single spacing on a typewriter and SPACING 2 is like double spacing. SPACING 2 puts one blank line between lines of text.
.AP (SWITCH AUTOPARAGRAPHING ON)	Causes any blank line or any line starting with a space or tab to be considered as the start of a new paragraph. This command allows normally typed text to be justified without special commands. It does not cause a paragraph if blank lines are followed by a command.
.NAP (SWITCH AUTOPARAGRAPHING OFF	disengages the AUTO- PARAGRAPH mode.

Now we give SETL code for our text preparation system.

program prepare;        -- text preparation system
    use random_pak;
    
    var                                     -- global variables
       last_page_had_output := true,        -- did the preceding page allow any input to be processed?
       Page_horizontal,            -- horizontal width of paper
       Page_vertical,             -- vertical height of paper, in lines
       Spacing,                    -- current spacing of lines
       Left_margin,                -- left margin
       Right_margin,               -- right margin
       Autoparagraph,              -- switch for autoparagraphing
       Tuple_of_words,             -- collects words of input for output
       Justify,                    -- controls right justification
       Fill,                         -- controls filling of lines
       Line_count,                 -- counts number of lines on page
       Page_number_stack,       -- stack of page & subpage numbers
       Number_pages,              -- switch for page numbering
       Main_title,                 -- page title
       Subtitle,                  -- page subtitle
       Print_header,             -- controls header printings
       Fillj_save,              -- saves fill & justify during LIT
       First_page,                 -- switch for first page
       Chapter_number,          -- current chapter number
       Page_advance,              -- page advance character
       Margin_save,                -- saves margins during indented note
       Figure_lines,              -- number of lines reserved for figure
       Figure_flag,              -- switch to leave space for figure
       Page_figure_flag,        -- leaves space for figure on top of next    page
       Indent_flag,              -- switch for indentation
       Para_indent_flag,        -- switch for paragraph indentation               
       Number_blanks,              -- number of spaces to indent
       Paragraph_spacing,       -- current spacing between paragraphs
       Paragraph_indent;        -- number of spaces to indent for paragraph
 
    var ihandle,ohandle,rand_handle;        -- file handles; also handle for random number generatiion                
                        
    const  -- constants designating all commands
       BR := "BR",S := "S",B := "B",FG := "FG",I := "I",P := "P",C := "C",NT := "NT",EN := "EN",TP := "TP",
                NM := "NM",NNM := "NNM",CH := "CH",NC := "NC",T := "T",FT := "FT",SB := "SB",PG := "PG",ESP := "ESP",
                    HD := "HD",NHD := "NHD",J := "J",NJ := "NJ",F := "F",NF := "NF",LIT := "LIT",ELI := "ELI",LM := "LM",
                         RM := "RM",PV := "PV",SP := "SP",AP := "AP",SS := "SS",NAP := "NAP";

    const Legal_ops :=   -- legal commands
          {BR, S, B, FG, I, P, C, NT, EN, TP, NM, NNM, CH, NC, T, FT, SB, PG,
                 ESP, HD, NHD, J, NJ, F, NF, LIT, ELI, LM, RM, PV, SP, AP, SS, NAP};

     const Cause_new_line :=         -- these commands cause Tuple_of_words to be
                                    -- emptied. Text immediately following these
                                    -- commands is output at beginning of new line.
        {BR, S, B, I, C, NT, EN, PG, CH, J, NJ, F, NF, LIT, ELI, LM, RM};

--              ************** GENERAL COMMENTS **************

    -- The text preparation system's main job is to FILL and/or JUSTIFY the text found in its 
    -- source file. The action of the main procedure depends on the FILL and JUSTIFY settings.
    
    -- If both the FILL and JUSTIFY switches are off, text is printed in the same
    -- format as input. In all other cases words of text are broken out of the input
    -- line and placed in a Tuple_of_words. If FILL is off and JUSTIFY is on
    -- (i.e., we are justifying but not filling lines), then Tuple_of_words is printed in justified form
    -- immediately after a line of input is processed. If FILL is on then Tuple_of_words
    -- is filled until one of the following three cases arises:
    
        -- (1) a command line is encountered. (a line begining with a period).
    
        -- (2) a new paragraph is to begin.
    
        -- (3) end of input.
    
    -- However the entire Tuple_of_words is not output at this time. Those words that do not
    -- fill the last line of output remain in Tuple_of_words until a specific command that
    -- causes a new line is encountered, or until case 2 or case 3 is encountered.
    
    -- NOTE: Only if FILL is on can ore than one line of input be held temporarily in Tuple_of_words
    -- before being output 
    
    -- 'Autoparagraphing' is a major feature of the text preparation system.
    -- If the Autoparagraphing switch is on, (it initially isn't), an input line beginning
    -- with a blank causes Tuple_of_words to be emptied, a preset number of lines (initially 1) 
    -- to be skipped, and a preset indentation (initially 5 spaces) to be applied. 
    
    -- Command lines differ from text lines in that they begin with a period.
    
    -- The source text to be processed should be  placed in a file named PREP.IN. 
    -- The file PREP.OUT will then contain the output.
    
    -- Two major types of error are detected by the program:
    
    -- Command error. This is caused by either an invalid command or an error
    -- in the command's parameters. In this case the command is ignored and
    -- an appropriate message is printed.
    
    -- Justify error. If JUSTIFY is on and FILL is off, too many words in a line
    -- of input (initially, more than 60 characters) or a single word will cause
    -- a justify error. This type of error terminates processing.
    
    -- Here are explanations of some of the subtler uses of variables in the following code:
    
    -- Indent_flag/Para_indent_flags: 
    -- Two different flags are used two handle the two following situations.
    -- Indent_flag is used when the command INDENT is encountered.
    -- Para_indent_flag is used in conjunction with the Autoparagraph command. 
    -- Since the INDENT command can be used when Autoparagraph is on, two flags are necessary.
    
    -- Figure_flag/Page_figure_flags: When the FG command is encountered,
    -- Figure_flag is turned on so as to leave a specified number of lines blank
    -- the next time Tuple_of_words is output. If an insufficient number of lines
    -- are left on the current page for the figure, blank lines must be left at the top of
    -- the next page. Page_figure_flag is turned on in this case.
    
    -- First_page. This switch is initially on, but is turned off after the first page
    -- headers have been printed. 
    -- It causes page headers for the first page to be output at the same time that
    -- the first words of tTuple_of_words output.
    
    -- Page headers are not printed during initialization, since we may first want to  execute
    -- commands which will change certain default initializations. 
    -- This is especially necessary in connection with the following commands;
    
    --     Fill      Fill      Fill      Fill -> ON (initialization value)
    --     Justify      Justify      Justify      Justify -> ON
    --     Page number switch      Page number switch      Page number switch      Page number switch -> ON
    --     Page header switch      Page header switch      Page header switch      Page header switch -> ON
    --     Title      Title      Title      Title -> NULL
    --     Subtitle      Subtitle      Subtitle      Subtitle -> NULL
    --     Chapter#      Chapter#      Chapter#      Chapter# -> 1

                -- **** MAIN PROGRAM OF TEXT PREPARATION SYSTEM ****

   initialize;            -- initialize all global variables  and set input and output files

   loop                    -- remain in loop until all text is processed

     geta(ihandle,line);           -- read line of input

     if line = OM then exit; end if;    -- end of text input
                
     if match(line,".") /= "" then    -- a command line

           cmd := break(line," \t"); span(line," \t");        -- break out the comand
       
           if cmd notin Legal_ops then error_proc(cmd); continue; end if;
        
           if Fill then print_lines; end if;
                        -- output the text collected in Tuple_of_words in its correct format. 
                        -- Words that remain in Tuple_of_words are those that do not fill the last line.
        
           command_tuple := command_check(cmd,line);
                   -- 'command check' checks validity of the command line.
                   -- command_tuple contains the command and its parameters.
        
           if command_tuple /= OM then 
               handle_command(command_tuple);  -- 'handle command' carries out the     command.
           else 
               error_proc(cmd);
           end if;
        
           continue;

      end if;                 -- end if match(line,".")

     if not (Fill or Justify) then
          output(line);           -- output line as it was read in
     else                -- if Autoparagraph is on, a blank space at the beginning
                        -- of the paragraph signals for new paragraph.

          if Autoparagraph and line(1) = " " then paragraph; end if;
        
              span(line," ");         -- bypass remaining blanks

              until line = "" loop    -- this loop places words of input into Tuple_of_words
        
                  if (next_word := break(line," ")) = "" then
                       next_word := line;
                  else
                       span(line," ");         -- bypass extra blanks
                  end if;

                  Tuple_of_words with:= next_word;        -- collect the word

              end loop;

     end if;

     if Justify and not Fill then print_lines; end if;
                -- if Fill is off, Tuple_of_words is printed after each line of text is read

   end loop;

finalize;                       -- take final cleanup steps

  -- ***************** END OF MAIN PROGRAM *****************
      

procedure initialize;     -- parameter & file name initialization
        -- this procedure initializes all global variables and opens the input & output files.

        ihandle := open("prep_in","TEXT-IN");            -- open source file
        ohandle := open("prep_out","TEXT-OUT");            -- open output file
         rand_handle := start_random(1000000,OM);        -- open a stream o f random numbers, using current time as seed
         
        Page_advance := "*";            
        Page_vertical := 58;            -- default lines per page
        Page_horizontal := 60;          -- default spaces per line
        Left_margin := 0;               -- default margins
        Right_margin := 60;             -- default margins
        Spacing := 1;                   -- single spacing
        Paragraph_spacing := 1;         -- lines between paragraphs
        Paragraph_indent := 5;          -- default paragraph indentation
        Indent_flag := false;           -- switch that controls indentation
        Para_indent_flag := false;      -- controls indentation of paragraphs
        Figure_flag := Page_figure_flag := false;       
        First_page := true;             -- turned off when first 'page' occurs
        Print_header := true;           -- initially on
        Main_title := Subtitle := "";   
        Autoparagraph := false;         -- initially off
        Tuple_of_words := [];           -- contains words of text to be processed
        Page_number_stack := [1];       -- initially on first page
        Fill := true;                   -- initially on
        Justify := true;                    -- initially on
        Line_count := 1;                -- counts lines on page
        Number_pages := true;           -- page numbering initially on
        Chapter_number := 1;            -- advances with each chapter

end initialize;

procedure page; -- page advance procedure 
            -- This procedure is invoked whenever output proceeds to a new page. This
            -- procedure puts out a line containing a page advance character, then the page
            -- number, title, & subtitle if these are switched on.
    
    if not last_page_had_output then print("PAGE IS TOO SMALL TO HOLD ANY OUTPUT"); stop; end if;
            -- prevent endless looping if page size has  been set too small
            
    last_page_had_output := false;        -- no output yet on this page
 
    printa(ohandle,Page_advance);            -- output a page output a page-advance character
    Line_count := 0;                        -- zero the line count
    First_page := false;                    -- drop the 'isfirst page' flag

    if Number_pages then    -- build up first line with page number, which may involve subpages as in nn.smm.sskk
        Number_line := "PAGE " +/[str(Page_number_stack(j)) + ".": j in [1..#Page_number_stack]];
        Number_line := Number_line(1..#Number_line - 1);         -- drop last dot
        Page_number_stack(#Page_number_stack) +:= 1;             -- advance top page number on  stack
        output(Number_line); output("");                         -- output page number, then empty line
    end if;

    if Print_header then            -- print the title and subtitle, centered
       center(OM,Main_title); center(OM, Subtitle);
       output("");                -- skip 1 line
    end if;

    if Page_figure_flag then          -- leave room for a figure at the bottom of the page
        Page_figure_flag := false;
        blankout(Figure_lines);      -- enough lines for figure
    end if;

end page;

procedure output(line);         -- this is the main output routine of the Prepare program

   if First_page then page; end if;            -- output a page header if the First_page flag is set  
   nblanks := Spacing;     -- number of lines to skip

   if Figure_flag then        -- skip space for figure

       Figure_flag := false;     -- if figure can fit on this page, room is left for it. If there is 
                                -- not enough space, room is left on the top of the next page. 
                                -- Page_figure_flag is used in thelater case.

       if Figure_lines + Line_count + Spacing >= Page_vertical then
           Page_figure_flag := true;            -- set flag to leave space at  bottom of page
        else 
           nblanks := Figure_lines + Spacing;    -- leave space on this page
         end if;

    end if;

    Line_count +:= 1;  -- counts lines on each page

    printa(ohandle,line);
 
    if Line_count >= Page_vertical then page; end if;    -- advance page if page is full
    blankout(nblanks - 1);                        -- print blank lines if double print blank lines if double-  or triple-spacing, etc.

end output;

procedure command_check(cmd,line);   -- breaks command out of line
        -- this procedure checks command and parameter validity; it also sends back
        -- the command and its parameters in a Tuple.

    case cmd

        when  BR, EN, PG, NM, NNM, SP, ESP, HD, NHD,
                        J, NJ, F, NF, LIT, ELI, AP, NAP =>  -- no parameters
                return [cmd];

        when CH, T, FT, SB =>           -- these commands have one string parameter

             if line = "" then return OM; else return [cmd,line]; end if;

        when  S, B, FG, I, TP, NC, LM, RM, SS, PV => -- these commands have one integer parameter

            if (param := integer_check(line)) = OM then 
                return OM;                              -- error encountered
            else 
                return [cmd,param];
            end if;

                        -- the remaining commands are treated separately

        when NT =>

                 if line = OM then                       -- has form 'NT text'. 
                     return [cmd,"NOTE"];   -- If 'text' is OMitted then the word 'NOTE' is used
                else 
                     return [cmd,line];     -- return command with trailing text
                end if;

        when C =>               -- has form 'C n; text'. n is optional

                if match(line,";") = "" then     -- semicolon is not first character, so integer is  present
                     
                     the_int := break(line,";");        -- find the integer
                    if (param := integer_check(the_int)) = OM then return OM; end if;    -- verify
                    span(line," /t");                -- drop trailing whitespace

                    if match(line,";") = "" then return OM; end if;        -- semicolon is missing
                    span(line," ");                    -- drop trailing whitespace

                    if line = "" then return OM; else return [cmd,param,line]; end if;        -- text is required

                else                 -- semicolon is first character, so integer is not present
                   span(line," /t");                -- drop trailing whitespace
                   if line = "" then return OM; else return [cmd,OM,line]; end if;        -- text is required
                end if;

        when P =>               -- has the form 'P n1 n2'

                if (param := integer_check(line)) = OM then return OM; end if;    -- check first integer

                span(line," /t");                -- drop trailing whitespace
                if (param2 := integer_check(line)) = OM then return OM; end if;    -- check second integer
                
                return [cmd, param, param2];     -- return command with both parameters

    end case;

end command_check;

procedure integer_check(rw line);     -- checks validity of string integer

    if (param  := span(line,"-0123456789")) = "" or line = "-" then return OM; end if;
    if match(param,"-") = "" then return unstr(param); else return -unstr(param); end if;

end integer_check;

procedure handle_command(command_tuple); -- command interpeter
        -- this command interpeter handles all Prepare commands.

     [cmd,p1,p2] := command_tuple;
     if (cmd in Cause_new_line) then print_remaining_line; end if;
--print("handle_command: ",cmd);
     case cmd

         when BR =>             -- break command
             return;

         when I =>            -- indent
             Indent_flag := true;
             Number_blanks := p1 max 0 min (Right_margin - 10);

         when NM =>          -- resume page numbering
             Number_pages := true;

         when NNM =>                     -- end page numbering
             Number_pages := false;

         when NC =>                      -- set chapter number
             Chapter_number := p1;

         when T =>                       -- supply title
             Main_title := p1;

         when SB =>                      -- subtitle
             Subtitle := p1;

         when SP =>                      -- start subpage
             page;
             Page_number_stack with:= 1;

         when ESP =>                     -- end subpage
             page;

             if #Page_number_stack > 1 then  -- drop one level
                  junk frome Page_number_stack;
                  Page_number_stack(#Page_number_stack) +:= 1;
             end if;

         when HD =>                              -- print page headers
             Print_header := true;

         when NHD =>                             -- end page headers
             Print_header := false;

         when J =>                               -- begin justification
             Justify := true;

         when NJ =>                              -- end justification
             Justify := false;

         when F =>                               -- fill lines
             Fill := true;

         when NF =>                              -- end filling lines
             Fill := false;

         when PV =>                              -- lines per page
             Page_vertical := p1;

         when LIT =>                             -- suspend fill/justify
             Fillj_save := [Fill, Justify];
             Fill := Justify := false;

         when ELI =>                             -- resume fill/justify
             [Fill, Justify] := Fillj_save;

         when LM =>          -- set left margin
             Left_margin := p1 max 0 min (Right_margin - 10);
             Page_horizontal := Right_margin - Left_margin;

         when RM =>          -- set right margin
             Right_margin := p1 min (Page_horizontal + Left_margin) max (Left_margin + 10);
             Page_horizontal := Right_margin - Left_margin;

         when SS =>         -- set spacing
             Spacing := p1 max 1 min 5;

         when AP =>         -- start autoparagraphing
             Autoparagraph := true;

         when NAP =>                     -- end autoparagraphing
             Autoparagraph := false;

         when P =>                       -- set paragraph spacing
             Paragraph_indent := p1;
             Paragraph_spacing := p2;

         when S =>           -- skip n spacings
             if (p1 * Spacing) + Line_count > Page_vertical then
                  page;
             else
                 blankout((p1 max 0) * spacing);
             end if;

         when B =>           -- skip n lines
             if p1 + Line_count > Page_vertical then
                 page;
             else
                 blankout(p1 max 0);
             end if;

         when FG =>              -- leave lines for figure
              Figure_flag := true;
              Figure_lines := p1;


         when C =>           -- center text
              center(p1, p2);

         when PG =>                       -- start new page if current is not empty
              if Line_count > 0 then page; end if;

         when TP =>                      -- start new page if less than p1 lines remain

              if Line_count + p1 >= Page_vertical then page; end if;

         when CH =>                      -- chapter
              chapter(p1);

         when NT =>                      -- indented note
             blankout(2);                                                -- skip two blank lines
             if p1 = "" then p1 := "NOTE"; end if;                        -- use "NOTE" if caption absent
             center(OM,p1);                                                -- print the note header
             blankout(1);                                                -- skip line
             Margin_save := [Right_margin,Left_margin,Page_horizontal];    -- save the margin settings
             Left_margin +:= (Page_horizontal / 4);                        -- indent left margin 
             Right_margin:= Right_margin - (Page_horizontal / 4);        -- indent right margin
             Page_horizontal := Right_margin - Left_margin;                -- set available space

         when EN =>                      -- end indented note
             blankout(2);                                                -- skip two blank lines
             [Right_margin,Left_margin,Page_horizontal] := Margin_save;    -- restore the margin settings

        otherwise =>
                
            print("****** UNKNOWN COMMAND *****");
 
        end case;

end handle_command;

procedure paragraph;    -- autoparagraph procedure. This procedure is called when a space
                        -- begins a line of input (and AUTOPARAGRAPH is on).

     if Fill then print_lines; end if;          -- first the Tuple_of_words is completely output.
     print_remaining_line;

     Para_indent_flag := true;       -- the next time the Tuple is printed, indentation 
                                     -- will be made. (see procedure 'indenter').
     blankout(paragraph_spacing - 1);        -- write designated number of blank lines  before paragraph 

end paragraph;

procedure blankout(nlines);             -- leaves nlines empty. If the end of page
                                        -- is reached no more lines are blanked out.
    if First_page then page; end if;

     for j in [1..nlines] loop
         Line_count +:= 1;
         printa(ohandle,"");       -- outputs a blank line
         if Line_count >= Page_vertical then page; return; end if;        -- new page, no more blank lines
    end loop;
    last_page_had_output := true;        -- note  this non_recurring output

end blankout;

procedure indenter;     -- if Para_indent_flag or Indent_flag is on, this procedure causes indentation.
                        -- This is done by adding blanks to the beginning of the first word in the Tuple.
                        -- Number_blanks contains the number of blanks to be indented.

     If Tuple_of_words = [] then return; end if;        -- nothing to do in this case
 
     if Para_indent_flag and Indent_flag = false then    -- indent for paragraph only
         Number_blanks := Paragraph_indent;                -- use size of paragraph indent
     end if;

     if Indent_flag or Para_indent_flag then    -- we  indent  only oneline
        Para_indent_flag := false; Indent_flag := false;        -- drop flags, since  indenting this line

                        --prefix the first word in the Tuple with blanks
        Tuple_of_words(1) := (" " * Number_blanks) + Tuple_of_words(1);
     end if;

end indenter;

procedure print_lines;          -- generic printing procedure
     
     indenter;
     printed_lines := if Fill and Justify then fill_and_justifier() else fill_only() end if;

      for line in printed_lines loop          -- Fill on, Justify off
           line := (" " * Left_margin) + line;
           output(line);
      end loop;

end print_lines;

procedure fill_and_justifier;       -- produces output that is filled and justified
    return fill_may_justifier(true);        -- call for fill and justify
end fill_and_justifier;

procedure fill_only;       -- produces output that is filled but not justified
    return fill_may_justifier(false);        -- call for fill with no justify
end fill_only;
  
procedure fill_may_justifier(justify_flag);       -- produces output that is filled and possibly justified

     last_page_had_output := true;        -- some input is processed

    indenter;            -- indent line if indent is on
     processed_lines := [];        -- start to collect justified lines
 
     loop
         nwords := nchar := 0;         -- number of words and of characters in line so far
                
         exited := false;            -- exit flag for following loop
                
         for word in Tuple_of_words loop
               nchar +:= #word + 1; nwords +:= 1;        -- count words and characters collected
              if nchar > (Page_horizontal + 1) then final_word  := word; exited := true; exit; end if;
                                    -- exit if full, but insist on collecting at least one word 
         end loop;
         
        if not exited then 
 
            if not Fill then             -- process last line, even though not yet full
                  processed_lines with:= put_spaces(#Tuple_of_words,Page_horizontal - nchar + 2,Justify);
                Tuple_of_words := [];        -- drop these words, which have been used
            end if;
  
            return processed_lines; 
 
        end if;
         
        temp_line := put_spaces(nwords - 1,Page_horizontal - nchar + #final_word + 2,Justify);
        Tuple_of_words := Tuple_of_words(nwords..);

        if (wd := Tuple_of_words(1)) /=  OM  and #wd > page_horizontal then         -- excessively longword
            Tuple_of_words(1..1) := [wd(1..page_horizontal),wd(page_horizontal + 1..)];    -- split  into parts
        end if;
         
        processed_lines with:= temp_line;
         
      end loop;

end fill_may_justifier;

procedure justifier_error;
     abort("TEXT PREPARATION TERMINATED DUE TO ERROR IN USE OF JUSTIFY");
     stop;
end justifier_error;

procedure put_spaces(nwords,nblanks,justify_flag);        -- may insert spaces into line to force right may insert spaces into line to force right-justification

    space_count  := (nspots := (nwords -  1) max 0) * [1] + [0];
                -- this array will count number of blanks in spots following each word except last
        
    if justify_flag and nspots > 0 then        --  insert  spaces to justify
 
        for j in [1..nblanks] loop        -- insert required total of blanks in random positions
            space_count((random(rand_handle) mod nspots) + 1) +:= 1;
        end loop;

    end if;
                        
    return "" +/ [Tuple_of_words(j) + space_count(j) * " ": j in [1..nwords]];        -- now collect words

end put_spaces;

procedure print_remaining_line; -- procedure prints remaining line from  Tuple_of words

    if Tuple_of_words = [] then return; end if;
    temp_line := (" " * Left_margin);
    word fromb Tuple_of_words;

    while word /= OM loop
        temp_line +:= word + " ";
        word fromb Tuple_of_words;
    end loop;

    output(temp_line);

end print_remaining_line;

procedure center(n,text);   -- center text on column n

    if n = OM then n := (Page_horizontal / 2) + Left_margin; end if;
    n := (n - #text/2) max 0;
    line := (n * " ") + text;
    output(line);

end center;

procedure chapter(text);                -- new chapter is to begin, chapter headers are printed.

        Print_header := false;
        page;
        blankout(3);
        center(OM, "CHAPTER " + str(Chapter_number));

        Chapter_number +:=1;
        blankout(2);
        center(OM, text);
        blankout(3);
        Main_title := text; -- set the title to the chapter text

                        -- the following are reset to their initial values
        Print_header := true;
        Subtitle := "";
        Spacing := 1;
        Justify := true;
        Fill := true;
        Left_margin := 0;
        Right_margin := 60;
        Page_horizontal := 60;

end chapter;

procedure error_proc(cmd); -- prints out error message
        print("ERROR ENCOUNTERED WITH COMMAND: ", cmd, " - COMMAND IGNORED");
end error_proc;

procedure finalize;    -- finalize system, first print what's left in the tuple.

        if Fill then print_lines; end if;
        print_remaining_line;
        close(ihandle);
        close(ohandle);

end finalize;

end prepare;

9.5 A Simplified Financial Record-keeping System

Next we will give SETL code representing some small part of the operations of a bank, albeit in simplified form. The system to be represented corresponds in a rough way to the "Checking Plus" service offered by Citibank in New York City. Note, however, that the simple code shown does not deal adequately with all the anomalies and error conditions that a full-scale banking system would have to handle, nor does it support all the functions that are actually required. For example, the code we give does not provide any way for customer accounts to be opened or closed. A more ambitious commerical application showing how such matters can be treated would be very instructive, but since the issues that enter into the design of a full-scale commercial systcm can grow to be quite complex, we will not attempt to discuss the whole interesting range of questions that enter into the design of such systems.

The simplified system which we consider is aware of a collection of customers, each of whom has an account. A customer's account consists of two parts: a balance representing funds available to the customer, and an overdraft debit representing the amount that the customer has drawn against the Checking Plus feature of his/her account. This debit is limited for each account not to exceed a given credit_limit, established when the account is opened. The bank pays 5% per annum daily interest on positive balances in checking accounts, and charges 18% per annum daily interest on overdraft debits. Like most commercial application programs, the following code maintains a data base, i.e., a collection of maps which collectively represents the situation with which the program must deal, and reads a transaction file whose entries inform it of changes in this situation. Using these files it produces various output documents, for example, lists of checks deposited for transmission to other banks, and monthly statements which are mailed to customers.

The transactions supported by our simplified system are as follows:

Transaction Code	Explanation
deposit (D)	Customer deposits either cash, a check drawn on another bank, or a check drawn on this bank.
withdrawal (W)	A customer appears at a teller's booth and attempts to withdraw cash.
payment (PA)	Customer transfers a stated sum from his available balance to reduce his overdraft debit.
presentation (P)	Check is presented by another bank for payment.
clear (C)	Another bank informs this bank that a check has cleared for payment.
return (R)	A previously deposited check, sent to another bank for payment, is returned either as a bad check or for lack of available funds. (Checks written without sufficient funds cause their writer account to be debited $10.00).
end of day (DAY)	End of banking day has arrived, daily interest is to be credited/debited to all accounts.

On the last day of each month, an end_of_day transaction triggers the production of bank statements which are sent to each customer. On the last day of December, this statement includes an indication of interest charged and interest earned during the year.

Each transaction handled is represented by a single line (string) in the transaction file. This line always starts with a code letter identifying the transaction and for the rest consists of various fields, separated by blanks.

The fields expected for the various transactions supported are as follows:

D	customer number,amount,bank_number,account_number (missing if cash deposit)
W	customer_number,amount,teller_terminal_number
PA	customer_number,amount
P	customer_number,amount,check number,bank number
C	check number
R	check number reason
DAY	(None)

The global data structures used to support our simplified banking system are as follows:

cust_info

This map sends each customer_number into the record maintained for the corresponding customer.

The components of a customer record are

balance available	balance currently available
balance_deposited	balance showing checks deposited but not yet cleared
overdraft_debit	amount currently drawn against "Checking Plus"
overdraft_limit	maximum overdraft allowed
transactions_this_month	list of all completed transactions this month
interest earned	total interest earned this year
interest_paid	total interest paid for overdrafts this year
name	customer name
social security number	customer social security number
address	customer address
telephone number	customer telephone number
bank_info	This map sends the numerical code of each bank from which checks will be accepted into the bank's address information.
pending_checks	When checks deposited are sent along to another bank for confirmation of payment, they are issued unique numerical identifiers. This map sends each such identifier into the transaction to which it corresponds.

Having now outlined all the transactions which our simplified banking system will support and listed the principal data structures which it uses, we are in position to give the code itself

program bank_checking;   -- simplified check simplified check-processing program
    use sort_pak;
    
        -- ******* DECLARATION OF GLOBAL VARIABLES AND CONSTANTS *******

    var         -- global variables
        Cust_info,        -- maps account number into customer record
        Bank_info,        -- maps bank number into bank address, etc.
        Pending_checks,    -- maps each suspended transaction number into a detailed transaction record
         This_banks_code,    -- code identifying this bank
        Check_counter,    -- counter identifying checks sent to other banks for verification
        Message_list,    -- maps each bank identifier into a list of messages to be sent to the bank.
        Bad_transactions,    -- accumulated list of bad transactions
        Transfile,    -- file of transactions, to be processed
        Last_day;    -- last day for which 'DAY' operation was run

    var transaction_handle;            -- handle for transactions file 
    var is_January;                    -- flag for first month of year
    var simulated_date := OM;        -- nominal date in banking simulation

    var traversed:= {};                    -- code points traversed 

    const                       -- strings indicating transaction results

         cash_dep := "CASH_DEP",                 -- cash deposit
         cash_withdrawal := "CASH_WITHDRAWAL",     -- cash withdrawal
         payment := "PAYMENT",                    -- payment of check
         ckplus_payment := "CKPLUS_PAYMENT",     -- "checking "checking-plus" payment
         deposit := "DEPOSIT",                    -- check deposited
         overdrawn := "OVERDRAWN",                -- charge for overdrawn check
         nofunds := "NOFUNDS",                    -- funds not available to pay check
         bad_check := "BAD_CHECK";               -- check drawn on nonexistent account

    const        -- constants designating transaction codes
        D := "D",W := "W",PA := "PA",P := "P",C := "C",R := "R",DAY := "DAY";

    const Transaction_codes := {D, W, PA, P, C, R, DAY};
            -- constants designating transactions.

    const Involves_customer := {D, W, PA, P};
            -- transactions whose second parameter is a customer number.

    const Needs_updating := {D, W, PA, P, C, R};
              -- transactions which modify customer record.

    const digits := "0123456789"; -- the decimal digits

    const Annual_rate := 6,    -- interest paid on checking balances
    Overdraft_rate := 18;    -- interest charged on overdrafts

                -- ****** MAIN PROGRAM OF BANKING SYSTEM ******
--setup;  stop;
initialize_system;    
    -- call initialization procedure to read in all required global data structures.

loop

    geta(transaction_handle,transaction);    -- read next transaction
    if transaction = OM then exit; end if;    -- all transactions processed
    if #transaction = 0 then continue; end if;        -- skip blanks
    if transaction(#transaction) = "#" then exit; end if;        -- exit  on endmarks
    if "#" in transaction then rbreak(transaction,"#");  rspan(transaction,"# \t"); end if; 
    process_transaction(transaction);        -- otherwise process transaction

end loop;

finalize_system;     -- write state of system to output file
print; print; print("END OF TRANSACTION PROCESSING"); print("Traversed: ",merge_sort(traversed));

procedure seen(pt);            -- a code a code-point is traversed
    traversed with:= pt;
end seen;

procedure setup;            -- set up initial data

    input_file := "banking_input_file"; Transfile := "transactions_file";
    input_handle := open(input_file,"BINARY-OUT");   -- open the input file for reading.
    This_banks_code := "Bank_0001"; Check_counter := 0;
    
    balance_available := 100000;            -- cust 1 info
    balance_deposited := 100000;
    overdraft_debit := 0;
    overdraft_limit := 100000;
    transactions_this_month := [];
    interest_earned := 0;
    interest_paid := 0;
    name := "Jack";
    soc_sec_no := "098-90-1888";
    address := "67 Broome St., NYC,NY,10011";
    tel_no := "212-998-1111";
        
    cust_1_info :=  [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, interest_earned,
        interest_paid, name, soc_sec_no, address, tel_no];
    
    balance_available := 200000;            -- cust 2 info
    balance_deposited := 200000;
    overdraft_debit := 0;
    overdraft_limit := 200000;
    transactions_this_month := [];
    interest_earned := 0;
    interest_paid := 0;
    name := "Jill";
    soc_sec_no := "089-09-2777";
    address := "76 Groom St, NYC, NY, 10211";
    tel_no := "212-678-1234";
    
    cust_2_info :=  [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, interest_earned,
        interest_paid, name, soc_sec_no, address, tel_no];
        
    Cust_info := {["001-001-0001",cust_1_info],["001-001-0002",cust_2_info]};
    Pending_checks := {}; 
    
    bank_name := "Prettybank";                -- bank 1 info
    bank_address := "101 Wall St, NYC, NY, 10001";
    bank_tel :=  "212-444-4444";
    
    bank_1_info := [bank_name,bank_address,bank_tel];
    
    bank_name := "Time Savings Bank";                -- bank 2 info
    bank_address := "201 Tall St, Bklyn, NY, 20022";
    bank_tel :=  "718-666-0000";
    
    bank_2_info := [bank_name,bank_address,bank_tel];
    
    Bank_info := {["Bank_0001",bank_1_info],["Bank_0002",bank_2_info]};
    Last_day:= "01/27/101";
    putb(input_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);
    close(input_handle);       -- now finished with input file; release it 
    input_file := "banking_input_file"; Transfile := "transactions_file";
    input_handle := open(input_file,"BINARY-IN");   -- open the input file for reading.
    getb(input_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);
    close(input_handle);       -- now finished with input file; release it 
    print(This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);

end setup;

procedure process_transaction(t); 
            -- the principal transaction the principal transaction-processing procedure.
print("\ntransaction: ",t," "); 

    if (dec := decode_transaction(t)) = OM then return; end if;        -- since transaction is bad.
    [code,cust_number,amount,p4,p5] := dec;    -- get fields of transaction
    number:= cust_number;        -- for the 'C' and 'R' transactions,wich use a  generated transaction serial number

if code in Involves_customer then

    [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, interest_earned,
        interest_paid, name, soc_sec_no, address, tel_no] := Cust_info(cust_number)?[];     -- obtain fields of customer record
                                                -- empty record signals that customer is bad
            -- the vector of items constituting a customer's record. Note that all
            -- amounts are kept as integer numbers of pennies.
            -- make balance_available, balance_deposited, overdraft_debit, overdraft_limit, etc. available.
--print("Involves_customer: ",cust_number," ",Cust_info(cust_number)," ",balance_deposited);
end if;

case code

    when D =>             -- deposit

        if p4 = OM then
seen("DA");
            balance_available +:= amount;     -- deposit is cash: accept it immediately        
            balance_deposited +:= amount;
            transactions_this_month with:= post(CASH_DEP,amount);  

        elseif p4 = This_banks_code then         -- check is drawn on this bank
seen("DB");
                -- We handle a check drawn on this bank as a combination of a 'P' transaction
                -- with the transaction (either 'C' or 'R') that responds to this 'P' transaction.
                -- For this, it is convenient to allow this procedure to call itself recursively 
            balance_deposited +:= amount;
            Cust_info(cust_number) := [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no];

            process_transaction("P " + p5 + " " + dollar(amount) + " 0 " + This_banks_code);
                            -- generate a  payment transaction for the account  on  which the check is drawn
            pending_checks("0") := t;  -- for consistency, note that pending check is drawn on this bank
    
            result := Message_list(This_banks_code)(1);        -- get the result  of this payment transaction
            Message_list(This_banks_code) := [ ];     -- get result and clear message list

            process_transaction(result);             -- process the resulting 'C' or 'R'; check OK or refused
            return;                                 -- since all steps of transaction are now complete

        else         -- the check is drawn on another bank. Note, but do not credit, the deposit.
seen("DC");
            balance_deposited +:= amount;
            identifier := str(Check_counter +:= 1);
            Pending_checks(identifier) := t;     -- save transaction for later completion.
            Message_list(p4) with:= "P " + p5 + " " + dollar(amount) + " " + identifier + " " + This_banks_code;
                           -- send notification to bank on which the check is drawn

        end if;

    when W =>    -- withdrawal

        if ok_withdraw(amount,balance_available,overdraft_debit,balance_deposited,overdraft_limit) then
seen("WA");
            send_teller(p4,"PAYMENT APPROVED");
            transactions_this_month with:= post(CASH_WITHDRAWAL,amount);
        else
seen("WB");
            send_teller(p4, NOFUNDS);
        end if;

    when PA =>            -- payment of portion of overdraft debit

        will_pay := amount min balance_available min overdraft_debit;
         if will_pay = 0 then return; end if; -- bypass transaction  if payment  = 0
seen("PAB"); 
        balance_available -:= will_pay;
        balance_deposited -:= will_pay;        -- update balance_deposited
        overdraft_debit -:= will_pay;
        transactions_this_month with:= post(CKPLUS_PAYMENT,will_pay);

    when P =>            -- presentation (for approval) of check by other bank

        if (c_info := Cust_info(cust_number)) = OM then        -- check is bad
seen("PA"); 
             Message_list(p5) with:= "R " + p4 + " X " + BAD_CHECK;
               -- note: the "reason" parameter is forced into fourth position by the inserted X
            return;                -- abort transaction
        end if;

        [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no] := c_info;
                          -- make fields of customer info available

        if ok_withdraw(amount, balance_available, overdraft_debit,balance_deposited,overdraft_limit) then
seen("PB"); 
            Message_list(p5) with:= "C " + p4;    -- confirm clearance
            transactions_this_month with:= post(PAYMENT, amount);

        else        -- check must be refused for insufficient funds
seen("PC"); 
            Message_list(p5) with:= "R " + p4 + " X " + NOFUNDS;
                   -- note: the "reason" parameter is forced into fourth position by the inserted X.
                -- in this case the customer is charged a $10.00 fee, or whatever smaller amount remains in his account
            charge := 1000 min (balance_available + overdraft_limit - overdraft_debit) max 0;
            assert(ok_withdraw(charge,balance_available,overdraft_debit,balance_deposited,overdraft_limit));
            transactions_this_month with:= post(OVERDRAWN, charge);
        end if;

    when C =>    -- pending check clears
seen("C");
        assert(dec := decode_transaction(Pending_checks(number)))/= OM;

            -- We can make this assertion because the system represented here does not
            -- allow customer accounts to be closed. However, this assertion would
            -- continue to hold true even in a more realistic system, since in such a system
            -- we would not close an account until all its outstanding deposit transactions
            -- have been completed.

        Pending_checks(number) := OM; -- drop from pending list
        [-, cust_number,amount] := dec;  -- get customer number and amount
        [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no] := Cust_info(cust_number);
        balance_available +:= amount; -- credit to available balance
        transactions_this_month with:= post(PAYMENT, amount);

    when R =>        -- pending check fails to clear
seen("R");
        reason := p4;
            -- in this, case the p4 field contains the reason for refusal of the check transmitted for approval

        assert((dec := decode_transaction(Pending_checks(number)))/= OM);
               -- see comment following case(C)

        Pending_checks(number) := OM; -- drop from pending list
        [-,cust_number,amount] := dec;  -- get customer number and amount
        [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no] := Cust_info(cust_number);
        balance_deposited -:= amount;     -- debit the estimated total of deposits.
        balance_deposited -:= (1000 min balance_deposited);         -- apply bad apply bad-check charge
        balance_available -:= (1000 min balance_available);     -- apply bad apply bad-check charge
        transactions_this_month with:= post(reason, amount);

    when DAY =>    -- end of banking day: take end end of banking day: take end-of-day, and if necessary end-of-month, actions.
        end_of_day;        -- take end of day actions 
 seen("DAY");
        if day_field(daystring()) = "01" then
seen("MONTH");
            end_of_month;
        end if;

    otherwise =>        -- have some system error.
                        -- take end_of day action, save system, and note error.

        print("SYSTEM ERROR *** ILLEGAL TRANSACTION:", t);
        finalize_system;
        stop;

    end case;

    if code in Needs_updating then             -- customer information must be updated
        Cust_info(cust_number) := [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no];

        print("Updated: ",cust_number," ",Cust_info(cust_number));
    end if;

    print ("MESSAGE LIST", Message_list);

end process_transaction;

procedure ok_withdraw(amount,rw bal_avail,rw over_debit,rw bal_deposit,over_limit);
    -- This auxiliary procedure checks to see whether the stated 'amount' can
    -- be withdrawn from an account, by increasing the overdraft debit if
    -- necessary. If so, the balance available, amount provisionally on deposit,
    -- and overdraft debit are appropriately adjusted, and true is returned;
    -- otherwise false is returned.

    if amount > (bal_avail + over_limit - over_debit) then          -- no good
        return false;
    end if;

    bal_avail -:= (amt_frm_bal := amount min bal_avail);
    bal_deposit -:= amt_frm_bal;

    over_debit +:= (amount - amt_frm_bal);
    return true;

end ok_withdraw;

procedure post(trans_type,amount);        
    -- This auxilliary routine converts transactions into strings consisting of an
    -- amount, a coded indicator of the transaction type, and a date; the result is
    -- suitable for printing in a customer's end suitable for printing in a customer's end-of-month statement.

    return daystring() + " " + trans_type + " " + dollar(amount);
end post;

procedure decode_transaction(t);   -- decodes string form of transaction
    -- This procedure reads the string form of a transaction and decodes it into
    -- the various blank the various blank-separated fields of which it consists. It verifies that each
    -- field has the expected type. If any field is found to be bad, or if any field is
    -- missing, then the transaction is posted to a "rejected transactions" list, and
    -- this procedure returns OM.

    -- Otherwise, a tuple c consisting of the converted fields is returned.

    -- Map from transaction type to pattern of fields expected for transaction.
    -- See procedure  See procedure -field_check-, below, for an explanation of the codes
    -- appearing here.

    const XCABX := "XCABX",XCAX := "XCAX",XCA := "XCA",XXAXX := "XXAXX",
                        XX := "XX",XXXX := "XXXX",X := "X";        -- checkstring constants

    const Check_strings 
        := {[D,XCABX],[W,XCAX],[PA,XCA],[P,XXAXX],[C,XX],[R,XXXX],[DAY,X]};

    savet := t;       -- save original form of transaction string
    decoded_trans := [];     -- tuple for decoded form of transaction
    nfield := 1;      -- counter for field number
    check_string := "T";   -- check character for first field is 'T'

    while t /= "" and nfield <= #check_string loop

        span(t," "); -- span off blanks
        if (raw_field := break(t," ")) =  "" and nfield = 4 and decoded_trans(1) = "D" then exit; end if;
                    -- special case of cash deposit; just 3 fields
        
        if (field := field_check(raw_field,check_string(nfield))) = OM then
print("Bad_transaction: ",savet," ",t);    
            Bad_transactions with:= savet;
            return OM;
        end if;

        -- If the first field has just been decoded, use it to determine what further checks are necessary.
        if nfield = 1 then check_string := Check_strings(field); end if;

        decoded_trans with:= field;    
        nfield +:= 1;

    end loop;

        -- Check that all required fields, and no others, are present.

    if #decoded_trans = #check_string
            or (decoded_trans(1) = D and #decoded_trans = 3) then
        return decoded_trans;
    end if;

print("Bad_transaction: ",savet," ",t);    
    Bad_transactions with:= t;   -- otherwise missing or superfluous fields
    return OM;

end decode_transaction;

procedure field_check(field,test_char); -- auxiliary test/convert procedure

    -- This procedure checks the  This procedure checks the -field- passed to it for conformity with the
    -- expected field type, which is described by its  expected field type, which is described by its -test_char- argument.
    -- The allowed test_char characters, and their significance, are as follows:

        -- 'T': must be transaction code
        -- 'X': no test required
         -- 'C': must be customer account number
         -- "A" : must be dollar amount
         -- "B" : must be identifier of correspondent bank

    -- If the test fails, then OM is returned; if the test succeeds, and the field type is
    -- "A" , then the field is converted from standard DDDD.CC 'dollars and cents'
    -- form to an integer number of cents,

    case test_char

        when "T" => return if field in Transaction_codes then field else OM end if;
        when "X" => return field;
        when "C" => return if Cust_info(field) = OM then OM else field end if;

        when "A" => dollars := span(field, Digits);
                    if match(field,".") = OM then return OM; end if;
                    cents := span(field, Digits);
                    if #cents /= 2 or field /= "" then return OM; end if;
                    return 100 * unstr(dollars) + unstr(cents);

        when "B" => return if field /= This_banks_code and Bank_info(field) = OM then
                                     OM else field end if;
        otherwise => return OM;

    end case;

end field_check;

procedure initialize_system; -- system initialization code

    -- First we acquire the name of the input file for this run of the banking system,
    -- which is supplied as a command which is supplied as a command-line parameter;
--    input_file := getspp("OLD = OLD.DAT/OLD.DAT");

    -- Next we read the code for this bank, the pending transaction counter, the
    -- master customer file, the bank address file, and the last previous processing
    -- date, from the specified input information file.
    input_file := "banking_input_file"; Transfile := "transactions_file";

    input_handle := open(input_file,"BINARY-IN");   -- open the input file for reading.
    getb(input_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);
    close(input_handle);       -- now finished with input file; release it 

            -- Next various subsidiary initializations are performed.
    Bad_transactions := [ ];    -- list of bad transactions is empty

    Message_list := {[bank, [ ]]: x = Bank_info(bank)}; 
            -- start an empty message file for each correspondent bank

--    Transfile := getspp("TRANS = TRANS.DAT/TRANS.DAT");
    transaction_handle := open(Transfile,"TEXT-IN");    -- open file of transactions

end initialize_system;

procedure finalize_system;        -- end end-of-run 'dump' procedure
return;
    -- First we acquire the name of the output file for this run of the banking
    -- system, which is supplied on the command line.

    output_file := getspp("NEW = NEW.DAT/NEW.DAT");
    output_handle := open(output_file, "BINARY-OUT"); -- open the output file for writing.

    -- Next we write the code for this bank, the pending transaction counter, the
    -- master customer file, and the bank file to be specified output file

     putb(output_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,daystring());

    close(output_handle);      -- now finished with output file; release it(See Section 9.1).

end finalize_system;

procedure send_teller(terminal_no, msg);
    -- In an actual system, this procedure would send the message  In an actual system, this procedure would send the message -msg- to the
    -- teller terminal identified by  teller terminal identified by -terminal_no-. Since it is not easy to use SETL
    -- to send messages to more than one terminal, we simplify this procedure
    -- drastically and simply print  drastically and simply print -msg-, with an indication of the number of the
    -- terminal to which msg should actually be sent.

    print(msg," has been sent to terminal ",terminal_no);
end send_teller;

procedure end_of_day;       -- end of day procedure
    -- This procedure is called at the end of each banking day. In practice, it would
    -- write out a collection of files, including the following:

    -- (a) for each bank with which this bank does business, a file of messages, each
        -- representing either a
        -- (i) confirmation that a check transmitted for approval was actually approved;
        -- (ii) rejection of a check, with an indication of the reason for rejection;
        -- (iii) request for approval of a check,
    -- (b) a list of bad transactions, for visual inspection and possible reentry.

    -- We begin by crediting interest payments and making interest charges for all
    -- customers.

    -- First check to ensure that interest has not already been credited today.
    if daystring() /= Last_day then

        for [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no] = Cust_info(cust_number) loop

            interest_earned +:= (earned := (balance_available * Annual_rate) / 36500);
            balance_available +:= earned; balance_deposited +:= earned;

            -- Next, make charges on the customer's overdraft debit
            interest_paid +:= (owed := (Overdraft_debit * Overdraft_rate) / 36500);

            -- Draw this interest out of the account if possible. If not enough remains,
            -- interest will be charged as an overdraft, even though this causes the actual
            -- overdraft to exceed its stated limit.

            if not ok_withdraw(owed,balance_available,overdraft_debit,balance_deposited,overdraft_limit) then
                       -- run an "excess overdraft"

                overdraft_debit +:= owed - balance_available;
                balance_deposited -:= balance_available;
                balance_available := 0;

            end if;

            Cust_info(cust_number) := [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no];

        end loop;

        is_January := (month(daystring()) = "01");

    end if;

        -- Write a file of messages for each bank with which this bank does business.

     for bank_inf = Bank_info(code) | code /= This_banks_code loop

        write_message_file(bank_inf,Message_list(code));
        Message_list(code) := [ ]; -- clear the message list to avoid resending.
    end loop;

            -- Write out the file of bad transactions.
    write_bad_transactions(Bad_transactions);
    Bad_transactions := [ ];  -- clear the list of bad transactions

end end_of_day;

procedure write_message_file(bank_inf, mess_list);
    -- In a realistic system, this procedure might write a list of messages to a file, possibly on  
    -- magnetic tape, which was then sent by internet, air express, or special courier to one of
    -- the banks with which this bank does business. However, in our simplified
    -- system, we simply print out a  system, we simply print out a -bank_inf- as a header and follow it by the
    -- individual messages of mess_list.

    print; print; print(bank_inf); print; print;

    for m in mess_list loop print(m); end loop;

end write_message_file;

procedure write_bad_transactions(list);

        -- In a realistic system, this procedure might write its list of transactions to an
        -- on on-line disk file, which would then be scrutinized and manually edited,
        -- reference being made if necessary to the original handwritten or typed
        -- document which first ordered the transaction. However, in our simplified
        -- system, we simply print out the list of bad transactions.

    print; print("BAD TRANSACTION LIST"); print; 

    for m in list loop print(m); end loop;
    
    print("END OF BAD TRANSACTION LIST"); print; 

end write_bad_transactions;

procedure month(dstg); return dstg(1..2); end month;        -- extract  month from daystring

procedure end_of_month;            -- end end-of-month procedure
        -- This procedure, called on the last day of each month, prepares a monthly
        -- statement for each customer. If the month is January, a statement of total
        -- interest charged/earned appears on the statement, and the accrued interest
        -- fields in the customer record are cleared.
--print("end_of_month:",Cust_info); 
    if daystring() = Last_day then return; end if;    -- since statements have already been prepared.

    for [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, 
                interest_earned, interest_paid, name, soc_sec_no, address, tel_no] = Cust_info(cust_number) loop

        print; print(name," ",soc_sec_no); print(address);
        print("Monthly Statement as of ",(ds := daystring())(1..2),"/",ds(3..4),"/",ds(5..)); print;
        for trans in transactions_this_month loop print(trans); end loop;

        transactions_this_month := [ ];

        if is_January then        -- test for January
            print;
            print("SAVE THIS STATEMENT-IT CONTAINS VALUABLE TAX INFORMATION");
            print;
            print("Interest earned: ",interest_earned);
            print("Interest paid: ",interest_paid);
        end if;

    end loop;

end end_of_month;

procedure dollar(amt);     -- converts numerical amount to dollar
    return str(amt / 100) + "." + if #(cts := str(amt mod 100)) = 2 then cts else cts + "0" end if;
end dollar;

procedure day_field(stg); return stg(3..4); end day_field;        -- get day from date string

procedure daystring();         -- extracts month_day_year from nominal date

    dat := date();         -- get the calendar date
    return dat(1..2) + dat(4..5) + dat(7..9);        -- simplifies full date

end daystring;

end bank_checking;

9.6 A Turing-Machine Simulator

Turing machines, named after the famous English mathematician and computer scientist Alan Turing, are the most elementary kind of computer, so elementary that they are not used in any practical way but merely serve as idealized models of computation at its simplest. Used in this way, they play an important role in theoretical investigations of the ultimate limits of computability. A significant fact about these very simple computing mechanisms is that they can be programmed to imitate the action of any other computer; for example, a Turing machine can be programmed to take the text of any SETL program and print out its result. Turing machines consist of two basic parts: a tape and a read-write head. The tape is a linear array of squares, infinite in both directions. In a tape square, the automaton can print any character chosen from a finite collection called the tape alphabet of the Turing machine. All but a finite number of squares on the tape are always blank. At the start of each cycle of operation of the Turing machine, its read-write head is positioned at one of the tape squares and is in one of a finite collection of possible internal states s. The read-write head then reads the character c held in the square at which it is positioned and performs three actions, all determined by the character c which has just been read and the internal state s of the read-write head:

Some new character c' is written into the tape square at which the read-write head is positioned, replacing the character c that was there;
The read-write head passes into a new internal state s';
The read-write head either moves one step right, or one step left or remains where it is.

Plainly, these actions of the Turing machine can be defined by a map action(c, s), whose two parameters are a tape character c and an internal state s, and whose value is a tuple [c',s',n'], consisting of the tape character c' that will overwrite c, the new internal state s' of the read-write head, and an indicator n of the direction of head motion, which must be either + 1 (move right),-1 (move left), or 0 (don't move). The following procedures read in the description of a Turing machine, check this description for validity, read in the initial contents of the Turing machines' tape, and then proceed to imitate its actions. The tape is represented by a tuple 'tape' whosej-th component is the character written in thej-th square. Blank squares contain the blank character. The Turing machine stops when it reaches an internal state s such that action(c,s) is undefined. We assume that the Turing machine description read in initially is a set of quintuples [c,s,c',s',n'], each representing an action-map entry [[c,s],c',s', n]. This description is checked to verify that the action map it describes is really single-valued. The auxiliary procedure print_tape prints the contents of the Turing machine tape after each cycle of operation.

program Turing_simulate;   -- Turing machine simulator

    if (atps := read_check()) = OM then stop; end if;  -- illegal specification               

   [action,tape,position,state] := atps; -- unpack action table, 
                 -- initial tape, initial position, and nitial state and change internal state

    while (act := action(tape(position), state)) /= OM loop         -- until stop
    
        [tape(position),state,n] := act;        -- write new character to tape
    
      if (position +:= n) < 1 then        -- moved left to brand moved left to brand-new square
            tape := [" "] + tape;         -- add blank to left of tape
            position := 1;                -- and adjust position pointer
        elseif position > #tape then    -- moved right to brand moved right to brand-new square;
            tape with:= " ";            -- add blank to right of tape
        end if;
    
        print_tape(tape, position, state);
    
    end loop;

    print("Simulation ended. Character and state are: ",tape(position)," ",state);

procedure read_check;        -- reads and checks action table, tape, initial position, and initial state

    in_handle := open("turing_data","TEXT-IN");    
    reada(in_handle,actuples,tape,position,state);
    action := {[[c,s],[c2,s2,n]]: [c,s,c2,s2,n] in actuples};
    not_single := false;

    for im = action{cs} | #im > 1 loop    -- action is not single action is not single-valued

        not_single := true;
        print();
        print("action is indeterminate in condition", cs);
        print("actions could be:");

        for [c2, s2, n] in im loop
            print(c2, s2, n);
        end loop;

        print();

    end loop;

    if not_single then return OM; end if;       -- as indication of error in action table

    check((bad_cs := {cs: [c2,s2,n] = action(cs) | n notin {-1, 1, 0}}) = {},
             "Illegal tape-motion indicators occur for conditions:",bad_cs);

    check(is_integer(position), "Illegal initial position:", position);
    check(is_tuple(tape), "Illegal initial tape:", tape);
    check(forall t = tape(i) | is_string(t) and # t = 1,"Illegal initial tape", tape);

            -- now add extra blanks to the initial tape if necessary
    if position > #tape then            -- extend tape with additional blank squares
        tape +:= #tape_position * [" "];
    elseif position < 1 then
        tape := (1 - position) * [" "];    -- add extra blank squares to left
        position := 1;                    -- adjust index of position on extended tape
    end if;

    return [action, tape, position, state];

end read_check;

procedure print_tape(tape,position,state);        -- Turing machine tape print utility.
    -- This procedure is used to display the state of the Turing machine tape at
    -- the end of each cycle of simulation

    const sq := 18, hsq := 9;            -- one one-fourth and one-eighth screen size
    const screen_size := 72;            -- number of characters on terminal

    topline := screen_size * "_"; 
    topline(4 * hsq + 1..4 * hsq + 4) := botline := screen_size * "_";

    tape_string := (hsq * " ") +/ tape + (hsq * " ");
                                   -- convert tape to string and pad with blanks.
    tape_string := tape_string(position..position + 2 * hsq - 1);

    picture := +/ ["|" + t + " ": t in tape_string];
    picture(1) := " ";        -- remove first vertical bar.
    print(); print(topline); print(picture + " " + str(state)); print(botline);

end print_tape;

  procedure check(condition, message, quantity); -- utility macro for input utility macro for input-
      if condition then return true;  end if;         -- condition OK
       
       print(message, quantity);         -- diagnostic message and offending quantity
    return false;                    -- as indication of error
                
  end check;

end Turing_simulate;

Here is a small data file that you can use to test the preceding simulator. It describes a Turing machine with just two tape characters 'a' and 'b' and two internal states 'move_left' and 'move_right'. The machine is designed to change any run of a's into which its reading head is intially placed into a run of b's. It starts in the 'move_right' state. As long as it is positioned over an 'a' it stays in this state and continues moving right, one square at a time. When it encounters the character 'b' or ' ' it switches to the 'move_left' state and moves one position to the left. Then, as long as a's are encountered, it changes them to 'b' and continues moving left. After the last 'a' it stops.

The data file that describes this simple program is as follows. The intial tape is [b,a,a,a,a], and the reading head is initially placed on the second character, with the machine in 'move_right' state. The tuple representing the machine's state-transition table comes first in the file.

	[
	[a,move_right,a,move_right,1],
	[b,move_right,b,move_left,-1],
	[" ",move_right,b,move_left,-1],
	[a,move_left,b,move_left,-1],
	]
	[b,a,a,a,a]
	2
	move_right

You may wish to develop other machines of this type, and, if ambitious, try to reconstruct Turing's proof that a properly designed single machine of this simple kind can simulate the action of any computer program.

9.7 Huffman Coding of Text Files

The standard ASCII alphabet of computer characters contains 127 characters, each of which is usually represented at the machine level by a sequence of 8 binary bits. If large volumes of English-language text need to be stored, this internal coding, which uses just as much computer memory space to represent a rare character like 'z' as to represent a common character like 'e', is by no means optimal. It is better to represent frequently occuring characters by shorter sequences of bits, even though this forces one to lengthen the internal encoding of less frequent characters, since overall this will diminish the total storage required to store typical texts. An effective method for using variable- length encodings of this kind was described by David Huffman and has become known as Huffman coding. Huffman's technique is to arrange all the characters to be encoded as the terminal nodes of a binary tree, in the manner shown in Figure 9.1. This tree should be set up so that commonly occurring characters appear near its "root" node and rare characters appear at a greater distance from its root.

There will always exist a unique path from the root node of such a tree to each terminal node or "twig" of the tree, and any such path can always be described by a unique sequence of zeroes and ones, where '0' means "take the left branch" and '1' means "take the right branch down the tree. As the code for a character c we can therefore use the binary sequence describing the path

Figure 9.1 Binary Huffman Tree with Characters Attached to its Terminal Nodes.

from the root node of the tree to the terminal node at which c is attached. For example, the tree shown in Figure 9.1 would assign the code '000' to 'E', the code '0010' to 'T', the code '0101' to I, etc. To encode a sequence of characters, we simply concatenate the sequences of zeroes and ones representing its individual characters. To decode a sequence s of zeroes and ones, we start from the root of the Huffman tree which defines our encoding and use the leftmost bits of s to guide us down a path in the tree. As soon as we reach a twig of the tree we add the character attached to this twig to the sequence of decoded characters we are building up. The sequence of bits that led us to this character is then detached from s, and we return to the root of the Huffman tree and continue the decoding process using what remains of s. The three routines which follow embody this encoding and decoding technique. The 'Huff' procedure takes a character string and encodes it using Huffman's method. 'Puff', which is the inverse of 'Huff', takes the encoded form of a string s and recovers the original form of s. The third procedure, called setup, takes maps 'left' and 'right' representing a Huffman tree and uses them to initialize various global data objects required by the Huff and Puff routines.

The algorithm uses '1' and '0' to represent bits.

program Huffman;          -- Huffman  code procedures

var H_code,           -- maps each character into its Huffman code
        H_root,       -- root node of Huffman tree
        H_left,       -- maps each node of the Huffman tree to its left descendant
        H_right,      -- maps each node of the right tree to its left descendant
        H_char;       -- maps terminal nodes of the Huffman tree to the characters they represent

huff_test;              -- test code for Huffman program

procedure setup(root,left,right,chr); -- auxiliary initialization routine

                -- We begin by using the procedure arguments to initialize all but the first of
                -- the global variables listed above.

        H_left := left; H_right := right;
        H_root := root; H_char := chr;

        -- Next we calculate H_code(c) for each character c

        parent := {[y,x]: [x,y] in (H_left + H_right)};         -- This maps each tree node to its parent

        H_code := {};           -- begin calculating Huffman codes from tree structure

        for c = H_char(node) loop

                bits := "";                      -- initially, path is null

                while node /= H_root loop               -- chain up to the root, noting how we got there
                        bits := if H_left(par := parent(node)) = node then "0" else "1" end if  + bits;
                        node := par; -- step up to parent
                end loop;

                H_code(c) := bits;              -- record Huffman code for the current character

        end loop;

end setup;

procedure Huff(stg);            -- calculates Huffman code for string stg
        return "" +/ [H_code(c): c = stg(i)];   -- concatenate codes of individual characters
end Huff;

procedure Puff(Huff_stg);             -- decodes a Huffman-coded string

        stg := "";-- initialize decoded string
        node := H_root;-- start at Huffman-tree root

        for b = Huff_stg(j) loop -- examine binary bits of Huff_stg in order

                node := if b = "0" then H_left(node) else H_right(node) end if;

                if (c := H_char(node)) /= OM then-- have reached twig
                        stg +:= c; -- append to decoded portion
                        node := H_root;-- restart at Huffman-tree root
                end if;

        end loop;

        return stg;

end Puff;

		-- routines listed below shold be inserted here.....
		
end Huffman;

The encoding and decoding procedures shown sidestep the question of how to find the tree that will give us a maximum degree of text compression. Of course, the rule for finding this tree, given the frequency with which each character occurs in the text we are to encode, is Huffman's essential discovery. His rule is as follows: we begin by finding the two characters c₁, c₂ of lowest frequency. These are then logically "conglomerated" into a single joint character c, of which c₁ and c₂ become the left and right descendants, respectively. We remove c₁ and c₂ from the collection of characters which remain to be processed and replace them by c. Continuing this until only one character remains, we will have bulit the Huffman tree.

Represented in SETL, this procedure is as follows:

procedure Huff_tree(freq);      -- Huffman tree-build routine
    -- freq is assumed to map all the characters of our alphabet into their
    -- expected frequencies of occurrence.

        -- This procedure returns a quadruple [root, left, right, chr] consisting of the
        -- Huffman tree root, its left and right descendancy maps, and a map chr
        -- which sends each terminal node of the tree into the character attached to this node.

        -- Since the code which follows will represent tree nodes by character strings,
        -- the chr map is just the identity map on single-character strings and is
        -- conveniently set up right here.

    chr := {[c,c]: c in domain freq};

    left := right := {};      -- initialize the descendancy mappings

    while #freq > 1 loop        -- iterate till all nodes have been conglomerated  into 1

        [c1,freq_cl] := get_min(freq); [c2,freq_c2] := get_min(freq);
        freq(c := (c1 + c2)) := freq_cl + freq_c2;        -- form "group" character
        left(c) := c1; right(c) := c2;    -- make c1 and c2 descendants of c
    end loop;

    return [arb domain freq, left, right, chr]; -- which is necessarily the tree root

end Huff_tree;

procedure get_min(rw freq);
        -- This auxiliary procedure finds the character c of minimum frequency,
        -- returns c and its frequency, and deletes c from the domain of freq. Note
        -- that it uses a "dangerous" program construction, legal in SETL, but
        -- certainly not recommended for use in any context which is at all complex;
        -- namely it is a function which modifies the argument with which it is called.

    min_freq := min/[f: f = freq(c)];
    assert(exists f = freq(c) | f = min_freq);        -- find the minimizing c
    freq(c) := OM;        -- remove it from the domain of the 'freq' mapping
                        -- this modifies the input argument (which is read-write).
                        -- DANGEROUS!

    return [c,f];        -- return character and its frequency

end get_min;

procedure huff_test;            -- test code for Huffman program
    order := "e taionshrdluqwypfgjkmbvcxz";       -- blank is second most frequent
    freq := {[c,30 - j]: c = order(j)};         -- character in English text.
    [root, left, right, chr] := huff_tree(freq);
    setup(root, left, right, chr);
    print(huff("hello there"));
    print(puff(huff("hello there")));

end huff_test;

The output produced is

		00001011111011110101011010100000001011111111011
		hello there

showing that the Huffman coded form of 'hello there' is 47 bits, or just under 6 bytes long, saving nearly 50% of the original 11 byte message.

Various improvements and extensions of the procedures described in this section appear in Exercises 13-18.

9.8 A Game-playing Program

In this section, we will explore the basic structure of programs which play board games like chess and checkers which involve two players, whom we shall call "A" and "B" . The momentary state s = [p, x] of any such game can be defined by giving the position p of the various pieces or counters used in the game, and by stating which of the players, x = "A" or x = "B" , is to move next. Given any such state s, the rules of the game will determine the moves which are legal and hence will determine the set of all possible new states s₁,..., s_k, exactly one of which must be chosen by the active player, i.e., the player whose turn it is to move. We shall suppose in what follows that the map has_turn(s) determines this player (i.e., has_turn(s) is just x, if as previously s has the form [p,x]). We also suppose that the map next_states(s) gives us the set {s₁,..., s_k} of states to which the active player can move. Any such game will end as soon as certain states, called terminal states, are reached. (In chess, for example, these are the states in which one of the players has been checkmated.)

For purposes of analysis it is convenient to suppose that when a terminal state s is reached, D dollars are transferred from player B to player A. We can suppose either that the sum D is fixed or that it depends on s. It is actually more convenient to make the latter assumption, and we shall do so, supposing accordingly that we are given a function A_wins(s) defined on all terminal states s, and that when a terminal state s is reached the sum A_wins(s) is transferred from B to A. Plainly A is the winner if D > 0, B is the winner if D < 0, and the game counts as a tie if D = 0. It is convenient to suppose that A_wins(s) = OM if the state s is not a terminal state; then the condition A_wins(s) /= OM can be used to test for terminal states. The three functions has_turn(s) (whose value must be either "A" or "B" ), next_states(s), and A_wins(s) serve to encapsulate the basic rules of any two-player game we wish to study.

Next, to begin to understand the strategic considerations which determine the laws of effective play, it is useful to extend the function A_wins(s), which is only defined for terminal states, so that it becomes a function A_can_win(s), defined for all states. We do this in the following recursive way:

	A_can_win(s) = A_wins(s)?			(1)
			if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
				else min/ [A_can_win(sy): sy in next_states(s)] end if;

The meaning of this formula can be explained as follows:

If the state s is terminal, the game is over and the amount that A can win is exactly the amount that A has in fact won.
Otherwise, if it is A's turn to move, he will choose the move that is most favorable to him, shifting the game into that state sy in next_states(s) for which A_can_win(sy) is as large as possible. Conversely, if it is B's turn to move, she will defend herself as well as possible against A's attempts to win a maximum amount. B does this by shifting the game into the state sy for which A's attainable winnings are as small as possible. Since A wins what B loses, and vice versa, this is at the same time the state in which B's winnings are as large as possible.

It is not hard to see that if the function A_can_win defined by (1) is known, and if both players expect their opponents to play with perfect accuracy, player A should always use her turn to move to a state sy such that A_can_win(sy) is as large as possible, and player B should always use his turn to move to a state sy such that A_can_win(sy) is as small as possible. To show this, suppose that the sequence of states traversed in the history of a game, from the moment at which it reaches state s up to the moment at which the game terminates, is S = s₁,s₂,..., s_n, Using (1) it is easy to see that if A uses this strategy, A_can_win(sj) will never decrease, so that by using our recommended strategy A guarantees that when the game terminates he will win at least the amount A_can_win(s). Conversely, if B uses the strategy we recommend, then formula (1) shows that A_can_win(sj) will never increase. Hence, if player A ever makes a move which decreases the value of A_can_win from v to some value u which is less than v, then after this B can prevent him from recovering, i.e., from ever winning more than u. If follows that, if A gives his opponent credit for playing optimally, A must never "give ground" in regard to the function A_can_win, i.e., that when it is his turn to move he should always move to a new state s_j such that such that A_can_win(s_j) is as large as possible. (Of course, if he does this, then A_can_win(s_j) = A_can_win(s); see (1)). Reasoning by symmetry, we also see that B should always move to a new state sy such that A can win(s_j) is as small as possible.

These considerations indicate that any game-playing program will need to calculate the function (1). However, if the game being analyzed is at all complex, it will not be feasible just to use the recursive definition (1) as it stands, since the tree of possible moves and countermoves which (1) would examine will tend to grow very rapidly. For example, if at every level A has just 4 possible moves and B has 4 possible countermoves, then 256 different positions can evolve from an initial state s after A and B make two moves each, 64,000 different positions after A and B have made 4 moves each, and hence the recursion (1) would have roughly 16,000,000 positions to examine if we used it to look ahead through all possible combinations of 6 moves of A and 6 countermoves of B.

This makes it plain that it is important to accelerate calculation of the function A_can_win as much as we can. Several techniques for doing this have been developed, but we shall only describe a few particularly important methods of this kind, namely

Memoization of the position-evaluation function
Exploitation of symmetries
The 'alpha-beta' or 'reasonable expectations' pruning method
Heuristic guidance of the best-move search

The first two of these techniques are elementary, but the 'reasonable expectations' pruning method is subtler.

We will use the well-known children's game of 'Tik-Tak-Toe' or 'Noughts and Crosses" as a running example. This game is played on a 3 by 3 grid, originally empty. The players, whom we shall call 'A' and 'B', move alternately. Player 'A' (resp. 'B') can put an 'A' (resp. 'B') at any empty position in the grid. The first player to get 3 identical letters in a row, column, or diagonal wins. If the grid fills without any player winning, it is a tie. (As, alas, it always is if both players move optimally.) The termination and next_states routines for this game are given below. States of the game are represented as pairs [board,mover], where 'mover' is always 'A' or 'B',and 'board' is a tuple of three strings representing the three rows of the grid. Each row is a string of three characters, each of which is 'A', 'B', or '.' (representing an empty square.)

The termination routine, 'A_wins' first finds all 'non-blocked' rows, columns, and diagonals, namely those which do not contain letters placed by both players. If none remain, the game is a tie. If any is full, then the player whose letter occupies this row, column, or diagonal is the winner. Otherwise there is no winner, so OM is returned, and the game goes on.

The 'next_states' routine first checks to see if either player has won, and if so returns an empty list so as not to continue a game that is already decided. Otherwise it fills an empty square with the activeplayes lett in all possible ways, and returns the resulting list of states.

procedure A_wins(s);            -- termination routine for Tik-Tak-Toe

    [board,mover] := s;        -- unpack the state

    only_1_rows := {row in [1..ncols] | #{br: br = board(row)(col) | br /= "."} < 2};       -- rows with only 1 kind of fill
    only_1_cols := {col in [1..ncols] | #{br: row in [1..ncols] | (br := board(row)(col)) /= "."} < 2};    -- cols with only 1 kind of fill
    only_1_diags := {};
    if #{br: row in [1..ncols] | (br := board(row)(row)) /= "."} < 2 then only_1_diags with:= "up"; end if;                -- diagonal with only 1 kind of fill
    if #{br: row in [1..ncols] | (br := board(row)(4 - row)) /= "."} < 2 then only_1_diags with:= "down"; end if;        -- diagonal with only 1 kind of fill

    if only_1_rows = {} and only_1_cols = {} and only_1_diags = {} then return 0; end if;        -- tie

    if exists win_row in only_1_rows | #{col in [1..ncols] | board(win_row)(col) /= "."} = 3 then return if board(win_row)(1) = "A" then 1 else -1 end if;    end if;        -- find winner
    if exists win_col in only_1_cols | #{row in [1..ncols] | board(row)(win_col) /= "."} = 3 then return if board(1)(win_col) = "A" then 1 else -1 end if; end if;    
    if  "up" in only_1_diags and #{row in [1..ncols] | board(row)(row) /= "."} = 3 then return if board(1)(1) = "A" then 1 else -1 end if; end if;        -- find winner
    if  "down" in only_1_diags and #{row in [1..ncols] | board(row)(4 - row) /= "."} = 3 then return if board(1)(3) = "A" then 1 else -1 end if; end if;    -- find winner
    
    return OM;        -- nobody wins
    
end A_wins;

procedure next_states(s);        -- next_states routine for Tik-Tak-Toe
    
    if A_wins(s) /= OM then return []; end if;        -- if the game is decided, don't generate new states
    
    [board,mover] := s;        -- unpack the state
    empties := {[i,j]: i in [1..ncols], j in [1..ncols] | board(i)(j) = "."};
                            -- usable empty spots on board
    
    new_boards := [];
    nm  := if mover = "A" then "B"  else "A" end if;
    for [i,j] in empties loop new_board := board; new_board(i)(j) := mover; new_boards with:= [new_board,nm]; end loop;

    return new_boards;
     
end next_states;

procedure has_turn(s); return s(2); end has_turn;

Even this very simple game has surprisingly many potential states. Each of its 9 grid positions can be in any of 3 conditions: empty, filled with an 'A', filled with a 'B'. Either of the two players can be the next to move. Thus the number of possible states is 2 * 3**9, i.e. 39,366. This make it clear that we want to remember the values A_can_win(s) for states for which this function has already been calculated, rather than repeating its evaluating recursion unnecesssarily over the many thousands of states involved. That is, instead of using the raw evaluation procedure

procedure A_can_win(s); -- position evaluator

    val := A_wins(s)?      
               if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
                                else min/ [A_can_win(sy): sy in next_states(s)] end if;
    return val;
    
end A_can_win;

we need to 'memoize' it by introducing a global value 'seen_already', originally empty, which maps states that have already been evaluated into their 'win-lose-tie' values, and then return this value if it is already available. This gives us the following, greatly improved version of the evaluation function, embodying our first improvement, memoization of the position-evaluation function.

procedure A_can_win(s); -- position evaluator
    
    if (val:= seen_already(s)) /= OM then return val; end if;
    
    seen_already(s) := val := A_wins(s)?         -- note value of state
                        if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
                                else min/ [A_can_win(sy): sy in next_states(s)] end if;
    return val;
    
end A_can_win;

The overall program that can be built around this evaluator is as follows.

program test;                        -- game playing program

    const ncols:= 3;
    const all_A := ncols * "A",all_B:= ncols * "B";             -- winning and losing configuration

    var seen_already := {};  

    var count := 0;
    board_start := ncols * [ncols * "."];		-- empty Tik-Tak-Toe board
 
    game_start := [board_start,"A"];                  -- empty board, A moves first

    print(if (val := A_can_win(game_start)) = 1 then "A wins" elseif val = -1 then "B wins" else  "Game is tie" end if);
    
procedure A_can_win(s); -- position evaluator
	-- ... as above    
end A_can_win;
	-- other procdeures as above ...
end test;

Execution of this program shows that to evaluate the game outcome (that is, the value of the starting state, which is, as children know, a tie) the code calculates the values of approximately 11,500 game states. This makes it plain that further optimzations are desirable. One such is to exploit the game's symmetry: since the board on which it is played and the rules of play are both symmetrical, we need not evaluate any position whose mirror image has already been evaluated. The following variant of 'A_can_win' exploits this fact, for vertical mirroring of board positions. It illustrates the second general optimization technique listed above, namely exploitation of symmetries.

procedure A_can_win(s); -- position evaluator
    
    if (val:= seen_already(s)) /= OM then return val; end if;
    if (val:= seen_already(mirror(s))) /= OM then return val; end if;
    
    seen_already(s) := val := A_wins(s)?         -- note value of state
                        if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
                                else min/ [A_can_win(sy): sy in next_states(s)] end if;
    return val;
    
end A_can_win;
procedure mirror(s); [board,mover] := s; return [[board(3),board(2),board(1)],mover]; end mirror;

Execution of the Tik-Tak-Toe program with this improvement shows that it evaluates the game outcome after examining 6,000 states, about 50% fewer than if we ignore symmetry.

Next we turn our attention from these important but elementary techniques to the third and most sophisticated of the above-listed game-algorithm improvements, 'reasonable expectations' pruning. To introduce the notion of search pruning, we can first observe that for win-lose-tie games like Tik-Tak-Toe, a player A need who has found a winning strategy among the sucessors of a given state s in which he has the move need never examine any additional sucessors, since if the state s arises he (a) can choose the winning strategy, and(b) might as well choose the winning strategy, since in any case the best he can do is win. This leads to the simple following variant of 'A_can_win':

procedure A_can_win(s);         
    if (val:= seen_already(s)) /= OM then return val; end if;
    if (val:= seen_already(rev(s))) /= OM then return val; end if;
    if (v := A_wins(s)) /= OM then return v; end if;
               
     if has_turn(s) = "A" then
            
         max_till_now := -1;
            
         for sy in next_states(s) loop
             if (max_till_now max:= A_can_win(sy)) >= 1 then seen_already(s) := 1; return 1; end if;
         end loop;
    
         return max_till_now;
        
     else        -- has_turn(s) = "B"
    
        min_till_now := 1;
    
         for sy in next_states(s) loop
            if (min_till_now min:= A_can_win(sy)) <= -1 then seen_already(s) := -1; return -1; end if;
         end loop;
    
         return min_till_now;
    
    end if;

end A_can_win;

The fact that the loops in this form of 'A_can_win' are terminated early, i.e., terminated as soon as a winning strategy is found, sometimes improves the efficiency of (3) very substantially; this is the kind of improvement that we want. Executing the Tic-Tac_Toe prgom with this version of 'A_can_win', we see that it establishes that the starting position is a tie after evaluating roughly 4,500 positions, 25% less than the number examined if the search improvement which it embodies is not used.

Indeed, the sooner we can find a winning strategy (if one exists) the more effective such improvements should become. This leads us to the fourth of our improvement methods, heuristic guidance of the best-move search. Here we do not select moves for evaluation in random order, but give preference to moves which are deemed likeliest to lead to a win. The way in wich this is done will depend on the game being played. In our Tik-Tak-Toe case, it is reasonable (i) never to select a positon which lies entirely in rows, columns, and diagonals in which the opposing player has alredy mad a mark(since such a move can never contribute to a win), and (ii) to prefer moves into positions which create the largest number of 'threats' for the opponent.

The following routine incorporates this idea. It rates available empty squares according to the number of rows, columns, and diagonals which run through them and contain no square marked by the opponent. These are then returned in decreasing order of desirability (sorting is done with the standard merge-sort routine found in the sort_pak package distributed with SETL.)

procedure best_move_list(s);                    -- estimate the value of a move to the current mover

    [board,mover]:= s;            -- unpack state
    
    value := {};        -- map from positions to  their values
    opponent := arb({"A","B"} less mover);            -- the other guy

    empties := {[i,j]: i in [1..ncols], j in [1..ncols] | board(i)(j) = "."};
                            -- usable empty spots on board

    for row in [1..ncols] | not (exists cl in [1..ncols] | board(row)(cl) = opponent) loop         -- check rows
        for col in [1..ncols] | board(row)(col) = "."  loop value(row,col) := value(row,col)?0 +  1; end loop;
    end loop;

    for col in [1..ncols] | not  (exists row in [1..ncols] | board(row)(col) = opponent) loop          -- check columns
        for row in [1..ncols] | board(row)(col) = "."  loop value(row,col) := value(row,col)?0 +  1; end loop;
    end loop;

    if not (exists row in [1..ncols] | board(row)(row) = opponent) then           -- check rising diagonal
        for row in [1..ncols] | board(row)(row) = "." loop value(row,row) := value(row,row)?0 +  1; end loop;
    end if;

    if not (exists row in [1..ncols]|  board(row)(4 - row) = opponent) then     -- check falling diagonal
        for row in [1..ncols] | board(row)(4 - row) = "." loop value(row,4 - row) := value(row,4 - row)?0 +  1; end loop;
    end if;
    
    empties -:= domain(value);        -- keep the senseless moves, just in case
    
    return [y: [x,y] in merge_sort([[-y,x]: [x,y] in value])] + [x: x in empties];        -- return the inverse  map, as a list
    
end best_move_list;


The following slightly modified version of 'next_states' uses the state ordering produced by 'best_move_list'.

procedure next_states(s);        -- next_states routine for Tic next_states routine for Tik-Tak-Toe
    
    if A_wins(s) /= OM then return []; end if;        -- if the game is decided, dont generate new states
    [board,mover]:= s;            -- unpack state
    new_boards := [];            -- will build list
    
    nm  := if mover = "A" then "B"  else "A" end if;            -- the other guy
    for [i,j] in best_move_list(s) loop new_board := board; new_board(i)(j) := mover; new_boards with:= [new_board,nm]; end loop;
--print("new_boards: ",#new_boards);    if #new_boards = 0 then print(A_wins(s),s); end if;
    return new_boards;
     
end next_states;

With this further improvement the program establishes that the starting position is a tie after evaluating roughly 4,500 positions, no real improvement over our last method. The probable reason is that, since both players aim for a win, they explore alomost all move-tree banches other than than those that our earlier improvements have already eliminated. However, carful ordering of the next move list is of clear benefit when used in connection with the more sohisticated pruning techniques discussed below.

The precding discussion suggests the following variant evaluation procedure, which comes closer to the idea of 'reasonable expectations' search pruning. Suppose that one of the players, knowing that Tik-Tak-Toe is always a tie in the case of optimal play by both sides, decides to rest content with a tie instead always of trying for a win. (This 'conservative' play will lead to the same outcome in the case of an opponent playing optimally, but will miss chances to win against an opponent who makes errors.) This is easily done by changing the one line

        if (max_till_now max:= A_can_win(sy)) >= 1 then seen_already(s) := 1; return 1; end if;

in our earlier version of 'A_can_win' to

     if (max_till_now max:= A_can_win(sy)) >= 0 then seen_already(s) := max_till_now; return max_till_now; end if;

With this change the program confirms that the starting position is a tie after evaluating only 425 positions. (Without 'best move first' ordering, 675 positions would be evaluated instead.) This drastic improvement in search efficiency suggests pushing this idea as far as we can, which can be done as follows. Suppose that instead of evaluating the simple function A_can_win(y). which depends only on the current game state y, we elaborate it to the following two-parameter variant:

mover_can_win(s,hoped_for)

Here as before, s is a game state in process of evaluation, but now the extra parameter 'hoped_for' is the amount that the player active in state s can reasonably hope for, given the amount that the opponent player may be certain of in the context in which s comes under consideration. Let WORST denote the value of the worst outcome that a player need fear (in our Tic-Tak-Toe example, this is -1, representing loss of the game.) Then we can write the following recursion for 'mover_can_win':

procedure mover_can_win(s,hoped_for);            -- returns  amount that the active player can win if game state s is reached

       if (val := mover_wins(s)) /= OM then return val; end if;
            -- if the game has ended, return its outcome
        
    certain_of := WORST;            -- on first encountering an unexplored state, we fear the worst
    [board,mover] := s;                -- unpack the state
    
    for sy in next_states(s) loop

        [-,new_mover] := sy;        -- see who moves next

        if mover = new_mover then     -- I have another turn
            certain_of max:= mover_can_win(sy,hoped_for);        -- continue in whatever way is best
        else             -- it becomes the opponent's turn
            certain_of max:= -mover_can_win(sy,-certain_of);
                -- if I am already certain of winning the amount 'certain_of',  the opponent cannot  
                -- possibly win more than the negative of this amount, and should not hope to do more.  
                -- Also, if the value of 'sy' to the opponent is x, it is  Also, if the value of 'sy' to the opponent is x, it is -x to me.
        end if;

        if certain_of >= hoped_for then return certain_of; end if;
            -- no point continuing search if best possible outcome has been found

    end loop;
    
    return certain_of;        -- this state value is now known
    
end mover_can_win;

This variant of best-move searching expresses the idea of 'reasonable expectations' or'alpha-beta' search pruning. The memoized version of this, with the same use of symmetry as before, is

procedure mover_can_win(s,hoped_for);            -- returns  amount that the active player can win if game state s is reached

     if (val:= seen_already(s)) /= OM then return val; end if;        -- memoization
       if (val:= seen_already(rev(s))) /= OM then return val; end if;

       if (val := mover_wins(s)) /= OM then seen_already(s) := val; return val; end if;
            -- if the game has ended, return its outcome
        
    certain_of := WORST;            -- on first encountering an unexplored state, we fear the worst
    [board,mover] := s;                -- unpack the state
    
    for sy in next_states(s) loop

        [-,new_mover] := sy;        -- see who moves next

        if mover = new_mover then     -- I have another turn
            certain_of max:= mover_can_win(sy,hoped_for);        -- continue in whatever way is best
        else             -- it becomes the opponent's turn
            certain_of max:= -mover_can_win(sy,-certain_of);
                -- if I am already certain of winning the amount 'certain_of',  the opponent cannot  
                -- possibly win more than the negative of this amount, and should not hope to do more.  
                -- Also, if the value of 'sy' to the opponent is x, it is  Also, if the value of 'sy' to the opponent is x, it is -x to me.
        end if;

        if certain_of >= hoped_for then seen_already(s) := certain_of; return certain_of; end if;
            -- no point continuing search if best possible outcome has been found

    end loop;
    
    return seen_already(s) := certain_of; certain_of;        -- this state value is now known
    
end mover_can_win;

procedure mover_wins(s);             -- symmetrical form of A_wins
    [-,mover] := s; 
    return if (val := A_wins(s)) = OM then OM elseif mover  = "A" then val else -val end if; 
end mover_wins;

This variant of the evaluation algorithm shows that Tik-Tak-Toe is a tie after evaluating only 1500 position, not nearly as few as needed when one of the players tries only for a tie, but nevertheless much bettr than the 4500 evaluations needed if reasonable expectations pruning is not used. (Not also that if we omit the use of best-move-first ordering, the Tik-Tak-Toe evaluation cost of reasonable expectations pruning rises to 2700.

In spite of this very substantial improvement in search time which 'reasonable expectations pruning achieves, complex games will still lead to trees of moves which are so deep and branch so rapidly that full exploration using this algorithm is quite impossible. One technique used to cope with this fundamental difficulty is to limit the number of recursive levels that the code explorea. When this limit is reached, we use some ad hoc estimate, called an evaluation heuristic, to approximate the value of A_can_win(s). In effect, this approach pretends to replace the full game that we would like to analyze by a truncated game that is played for some limited number L of moves and then terminated with a payoff determined by the evaluation heuristic. To play the full game, we then reanalyze this truncated game each time it is a given player's turn to move and choose the best move in the truncated game as her recommended move in the real game. Assuming that A_estimate(s) is the estimated value of state s to player A, it is easy to modify the preceding codes to incorporate such a limit on the number of levels of move and countermove that will be examined. Doing so, we get

procedure est_mover_can_win(s,hoped_for,limit); 
	           -- returns estimated amount that the active player can win if game state s is reached

     if (val:= seen_already(s)) /= OM then return val; end if;        -- memoization
     if (val:= seen_already(rev(s))) /= OM then return val; end if;

     if (val := mover_wins(s)) /= OM then seen_already(s) := val; return val; end if;
            -- if the game has ended, return its outcome

    if (lim -:= 1) = O then
        val := if has_turn(s) = "A"  then A_estimate(s) else -A_estimate(s) end if;
        seen_already(s) := val; return val;
    end if;
        
    certain_of := WORST;            -- on first encountering an unexplored state, we fear the worst
    [board,mover] := s;                -- unpack the state
    
    for sy in next_states(s) loop

        [-,new_mover] := sy;        -- see who moves next

        if mover = new_mover then     -- I have another turn
            certain_of max:= mover_can_win(sy,hoped_for);        -- continue in whatever way is best
        else             -- it becomes the opponent's turn
            certain_of max:= -mover_can_win(sy,-certain_of);
                -- if I am already certain of winning the amount 'certain_of',  the opponent cannot  
                -- possibly win more than the negative of this amount, and should not hope to do more.  
                -- Also, if the value of 'sy' to the opponent is x, it is  Also, if the value of 'sy' to the opponent is x, it is -x to me.
        end if;

        if certain_of >= hoped_for then seen_already(s) := certain_of; return certain_of; end if;
            -- no point continuing search if best possible outcome has been found

    end loop;
    
    return seen_already(s) := certain_of; certain_of;        -- this state value is now known
    
end est_mover_can_win;

9.9 Implementation of a Macroprocessor

Languages of relatively low level, like assembly language and C, are very commonly provided with macroprocessors, but a mccro capability can be a useful adjunct to languages of higher level also.

Macros are abbreviations that obviate the need to write similar pieces of code repeatedly. They allow programmers to introduce convenient shorthand notations for constructs that are used many times in a program. Macros, like procedures, are defined once and can then be used several times.

Macros and procedures resemble each other in that both give ways of associating names with bodies of code text and of invoking this code when the name is mentioned. However, when a macro is mentioned in a program after having been defined, the program text which it represents is substituted directly for the invoking occurrence of the macro name; this substitution is called macro-expansion and is to be contrasted with the detour-and-return action (see Section 5.1) triggered by a procedure invocation. That is to say, macros make use of a purely textual mechanism; they simply replace the name of the macro by its definition at the point where the name appears. This means that unlike procedures (which can be invoked before their definition has been seen), macros must be defined before they are used; i.e., the definition of a macro must appear physically in a program before the macro is first used.

In this section we will first describe a general purpose macro capability and then show how to implement it. We allow macros of three basic kinds: parameterless macros, macros with explicit parameters, and macros with both explicit and generated parameters. These have the following three forms:

(1)	MACRO m_name;
	   macro-body
	ENDM;

(2)	MACRO m_name(p_name₁,p_name₂...,p_name_k);
	   macro-body
	ENDM;

(3)	MACRO m_name(p_name₁,..., p_name_k; gp_name₁,..., gp_name_n);
	   macro-body
	ENDM;

Macros without parameters provide for the simplest kind of abbreviation: the name of such a macro simply stands for its macro body, which replaces the macro name whenever this name appears. For example, we can write:

	MACRO countup;			(4)
		t := t + 1;
		if t > limit then
		errmsg('out of bounds');
		end if
	ENDM;

Following the appearance of definition (4) in a program, module, or procedure, any subsequent appearance of the name countup, for example, in the line

			(5) countup

triggers replacement of (5) by the body of (4), i.e., by the four lines of text shown previously.

Macros with parameters

Macros with parameters are introduced by macro definitions of the form

	MACRO m_name(p_name₁,...,p_name_k);
	   macro-body
	ENDM;

Here, m_name can be any legal identifier, which becomes the name of the macro introduced by (1); p_name₁,...,p_name_k, called the formal parameters of the macro, can be any list of distinct identifiers. The body of the macro can be any sequence of tokens.

After being introduced by a macro definition (1), the macro m_name can be invoked simply by using its name, followed by a list of k actual arguments, at any place within a text. Suppose, to be specific, that this invoking occurrence is

		m_name(arg₁,..., arg_k)	(2)

Then the macroprocessor implemented below replaces the macro invocation (2) with an occurrence of the body of the corresponding macro definition (1), but in this body every occurrence of a formal parameter name p_name_j will have been replaced by an occurrence of the corresponding argument arg_j. We emphasize again that this is to be done by replacement of text, and not, as in the case of a procedure call, by evaluation of arguments and transmission of their values. This means that the arguments arg_j of macro invocation need not even be complete, evaluable expressions; indeed, they can be arbitrary sequences of keywords, operator signs, constants, or identifiers. (However, since commas are used to separate the successive arguments of a macro invocation, no argument of such an invocation can contain an embedded comma.) This gives macros a syntactic flexibility which procedures do not have and which is sometimes useful. Suppose, for example, that we wanted to print out a series of examples illustrating the use of the compound operator in a language like SETL. In SETL, this could be done directly by using the following code:

	v:= [1,2,3,4,5];
	print("Combining the components of v using the operator + gives", +/v);
	print("Combining the components of v using the operator * gives", */v);
	print("Combining the components of v using the operator max gives",max/v);

By using a suitable macro, we could abbreviate this repetitive code, as follows:

	MACRO print_op(opsign, op);
		print("Combining the components of v using the operator", opsign,
			"gives", op/v)
	ENDM;
	v := [1,2,3,4,5];

	print_op("+", +);		(3)
	print_op("*", *);
	print op("max", max);

This illustrates the possibility of transmitting an isolated operator sign to a macro as an argument; notice that no corresponding possibility exists for procedures.

Macros with generated parameters

In addition to its ordinary parameters and arguments, macros can make use of generated parameters which play the role for macros that local variables play for procedures. To make use of this feature we write macro definitions having the form

	MACRO m_name(p_name₁,...,p_name_k; gp_name₁,...,gp_name_n);		(6)
	   macro-body
	ENDM;

The additional parameters gp_name₁,...,gp_name_n appearing after the first semicolon in (6) but not in (1) are called generated parameters. The programmer does not supply arguments corresponding to parameters of this kind when a macro like (6) is invoked. Instead, one invokes a macro like (6) in exactly the same way as the macro (1). However, when a macro like (6) with generated parameters is invoked, the macroprocessor generates new tokens (of an artificial form that cannot be used accidentally by the programmer) and substitutes them for occurrences of the corresponding generated parameter names in the body of (6).

A common use of this option is to generate a supply of fresh variable names when these are required for local use within the substituted body of a macro. Suppose, for example, that we want to write a macro which tests the value of an expression e for membership in a given set s, and which returns immediately from the procedure invoking the macro in case the test (e in s) fails. Suppose also that in case of failure we want to return both a numerical error indication and the value of the expression e. If we write

	 MACRO double_check(e, error_no); if e notin s then return [error_no, e];
	end if ENDM;

we would not get exactly the desired effect because when this macro is invoked, it will insert the actual argument for e in two places, which will lead to repeated evaluation of e (notice that e appears twice in the body). For example:

	double_check(f(y) + g(y), 15);

would expand into

	if f(y) + g(y) notin s then
	   return [15,f(y) + g(y)];
	end  if;

In order to avoid this double evaluation we can use the following macro (which has a :generated parameter);

	MACRO in_check(e, error_no; temp)
	   if (temp := (e)) notin s then return [error_no, temp]; end if	(7)
	ENDM;

To invoke this macro we would, for example, write

	in_check(t +:= x,1);
		...
	in_check(t +:= y,2);  	(8)
		...

Note that if (as in (8)) an argument expression e, causing some side effect, is passed to the macro (7), it becomes essential that the value of e should be assigned to an auxiliary variable (the generated parameter temp) and that e should not be evaluated twice. Note also that each use of (7) will generate a new name for the parameter temp so that no accidental interference will occur between invocations of this macro. Finally, note the use of a precautionary extra pair of parentheses around the occurrence of the parameter e in the body of (7); these parentheses ensure that the argument transmitted to the macro in place of e will be handled as a unit, no matter what its actual syntactic form happens to be.

Macro nesting

Macro bodies can contain invocations of other macros; and macro names can be transmitted to other macros as arguments. For example, suppose that we define the following two macros:

	MACRO triple(pa);
	   pa, pa, pa
	ENDM;

	MACRO q;
	   "hello there"
	ENDM;

Then, after expansion, the macro invocation

		triple(q)

becomes

	"hello there", "hello there", "hello there"

This example illustrates the fact that macro expansion is outside-in and recursive. That is to say, the expansion of a given macro body may trigger the expansion of an inner macro invocation.

Macro bodies can also contain embedded macro definitions. For example, the definition

	MACRO def_x(pa);
	   MACRO x; pa endm;		(9)
	ENDM;

is legal. An embedded macro definition imd becomes active when one invokes the macro M in which imd is embedded, thus causing the body of M to be expanded. As an example, note that after expansion the sequence

	def_x("aaa");
	x x x
	def x("bbb");		(10)
	x x x

becomes

	"aaa" "aaa" "aaa"
	"bbb" "bbb" "bbb"		(11)

This happens in the following way. The first line in (10) is expanded and, according to the definition (9), becomes the macro definition

		MACRO x; "aaa" ENDM;

Subsequently the second line of (10) is expanded. It generates the first line of (11). After this, the third line of (10) is expanded into

		MACRO x; "bbb" ENDM;

This changes the meaning of the macro x, causing the fourth line of (10) to expand into the second line of (11).

Dropping and redefining macros

If a macro is only needed over a limited portion of a program, it is possible to "undefine" it so that the name of the macro can be used for another purpose. To erase a macro definition, one uses a macor definition with a completely empty macro body (the body should not even contain blanks), as in

		MACRO x;ENDM;

Once a macro has been dropped, its name reverts to 'ordinary token' status. For example,

	MACRO x; print("now you see it") ENDM;
	x;
	MACRO x;ENDM; 	-- this drops x from macro-status
	x;
	MACRO x; print("now you don't"); ENDM;
	x;

expands into

	print("now you see it");
	x;
	print("now you don't");

This follows since the first line of (12) makes x a macro equivalent to "now you see it," but then the third line of (12) drops x from macro status, so that the fourth line of (12) carries over unchanged to become the second line of (13). The new definition of x is then seen, invoked, and expanded.

Considerably more elaborate macro features than those we have described are supported by many programming languages, especially by assembly languages. However, high level languages like SETL have less need for complex macro features than do lower-level languages, and thus the macro facility described in the preceding paragraphs would be found adequate for a language like SETL. Let us remark that macros, like procedures, can perform the useful function of hiding low-level details and thus help make a program more readable and more modular. The information-hiding capability of macros is most useful when we want to shield a program from possible changes in the structure of composite objects which it manipulates.

Implementation

The context within which our macroprocessor is to be implemented is assumed to be as follows:

(i) The macroprocessor reads a succession of tokens, obtained by decomposing some input file into successive tokens.

(ii) When the special token MACRO is encountered, a macro definition is opened. This token must be followed by a macro name, which can in turn be followed by a list of formal parameters and generated formal parameters, in the manner explained in Sections 8.2.3 and 8.2.4. The macro body following such a macro opener is collected and saved in a map def_of, which associates each macro name with its list of parameters, its list of generated parameters, and its macro body.

(iii) When a macro invocation starting with a token belonging to the domain of the map def_of is encountered, its actual arguments are collected, and the invocation is replaced by a substituted version of the macro body. This substituted text is logically inserted immediately in front of the remainder of the input file and reprocessed by the macro-expansion mechanism, thereby ensuring that macro invocations and definitions embedded within macro bodies will be treated in the manner described in Section 8.2.5.

(iv) The macroprocessor makes various syntactic checks. For example, it checks that the parameters appearing in a macro definition are all distinct, and that each macro invocation has as many arguments as the corresponding macro definition has parameters. If an error is detected, a diagnostic message is printed, and any macro action in progress is simply bypassed.

(v) The macroprocessor is structured as a module, which exports just one procedure, namely a parameterless procedure called next_tok, which can be called repeatedly to obtain the sequence of tokens representing the input file after macro expansion. When the input file is exhausted, next_tok will return OM. The macroprocessor module imports just one procedure, namely a parameterless procedure called input_tok. Successive calls to input_tok generate the sequence of input tokens which constitute the macroprocessor's initial input.

program macroprocessor; -- macroprocessor test
    use get_lines_pak;        -- use text  input utility
    var     
         gmac_ctr,        -- counter for generated macro arguments
         def_of,            -- maps macro names into their definitions
         expanded_tokens;    -- vector of tokens obtained by prior macro expansion
    
    var line_no,line_now,text;     -- globals for input reader
    
    const alphanums :=
        "abcdefghijklmnopqrstuvwxyzABCEDFGHIJKLMNOPQRSTUVWXYZ";
    
    const Illformed_list := "ILLFORMED MACRO PARAMETER LIST";
              -- error message to avoid later trouble in macro arglist
    const comma := ",";
    
    line_now := "";        -- initially line is empty
    line_no := 1;        -- initialize line count

    gmac_ctr := 0;     -- generated macro argument counter
    def_of := {};     -- initially no definitions
    expanded_tokens := [ ];  -- initially no prior tokens
    text := get_lines("macro.in"); -- read input
    
    toks := [];
    
    while (wd := next_tok( )) /= OM loop toks with:= wd; end loop;
    print("" +/ toks);

procedure input_tok(want_blanks);    -- input reader; operates in a 'want blanks' and an 'ignore blanks' mode
    
    if (blanks := span(line_now," \t\n\r"))  /= "" and want_blanks then return blanks; end if;
                         -- remove or return whitespace

    while line_now = "" loop    -- get next line of text
        if line_no > #text then return OM; end if;    -- signal end of input
        line_now := text(line_no) + "\n"; line_no +:= 1;    -- otherwise set up line
        if (blanks := span(line_now," \t\n\r"))  /= "" and want_blanks then return blanks; end if;
    end loop;
    
    wd := span(line_now,alphanums);

    return if wd /= "" then wd else len(line_now,1) end if;        -- return next word, or at least one character

end input_tok;

procedure another_tok(want_blanks);    -- "token feeder" for macro processor
    -- This returns the token standing at the head of expanded_tokens unless
    -- expanded_tokens is empty, in which case it calls the "primary" token source
    -- input_tok to get the token to be returned. It also operates in a 'want blanks' and an 'ignore blanks' mode
    
    while (tok fromb expanded_tokens) /= OM loop
        if tok(1) notin " \t\n\r" or want_blanks then return tok; end if;    -- return nonblank, or blank if wanted
    end loop;
    
    return input_tok(want_blanks);

end another_tok;

procedure next_tok;
        -- called to obtain successive tokens in the sequence of tokens generated by macro expansion
        -- tokens returned by 'input_tok' are returned after all tokens generated by prior macro expansions
        -- have been returned.

    loop    -- we return to this point whenever macro errors are detected
    
          if (tok := another_tok(true)) = OM then return OM; end if;
                                         -- end of input file encounted
    
        if (tok /= "MACRO") and (mdef := def_of(tok)) = OM then        -- token is ordinary;
            return tok;
        end if;
        
        if tok = "MACRO" then        -- start new macro definition
    
            if (parm_list := get_parm_list()) = OM
                    or (mac_body := get_macro_body()) = OM then
                continue;        -- since macro is bad
            end if;
        
            [mac_name,mac_pars,mac_gpars] := parm_list;         -- get macro name and parameters
        
            def_of(mac_name) := if mac_body = [] then OM     -- macro drop
                                else 
                                    [mac_pars,#mac_gpars,template(mac_body,mac_pars,mac_gpars)] end if;
        
        else           -- have a macro invocation
        
            [mac_pars,n_gpars,mac_template] := mdef;        -- look up macro definition
        
            if (arg_list := get_arg_list(#mac_pars)) = OM then    -- abort expansion
                continue;        -- since number of arguments and number of parameters differ
            end if;
    
            for n in [1..n_gpars] loop arg_list with:= [generated_parm()]; end loop;
                     -- generate additional parameters as required and replace the macro at the
                     -- start of the expanded_tokens vector by its expansion
        
            expanded_tokens := 
                [] +/ [if is_string(mac_tok) then [mac_tok] else arg_list(mac_tok) end if:
                        mac_tok = mac_template(j)] + expanded_tokens;
        end if;
    
            -- now that macro has been expanded, the top of the loop will try again to
            -- supply the requested token
    end loop;

end next_tok;

procedure get_parm_list;      -- gets sequence of parameters for macro
    -- The sequence of parameters collected by this procedure must be a
    -- comma comma-separated list opened by a left par~tnthesis and closed by a right
    -- parenthesis. If this syntax is violated, or if two parameters are identical,
    -- an error message is printed, and OM is returned.

    have_gen_parms := false; -- flag: No generated parameters yet

    mac_parms := mac_gparms := [];     -- initializes parameters and generalized parameters

    if (name := namecop := another_tok(false)) = OM or name /= span(namecop,alphanums) then

        print("ILLFORMED MACRO NAME");
        return OM;

    elseif (tok := another_tok(false)) = ";" then        -- terminating semicolon, macro with no parameters
 
         return [name,[],[]];     -- no parameters

    end if;

    if not check(tok = "(",Illformed_list) then return OM; end if;        -- otherwise macro list must open with paren

    until tok = ")" loop        -- until terminating parenthesis

        if not check((tok := another_tok(false)) /= OM, illformed_list) then return OM; end if;        -- get a token
        
        if have_gen_parms then mac_gparms with:= tok; else mac_parms with:= tok; end if;
                                        -- this must be a parameter or generalized parameter
        if not check((tok := another_tok(false)) /= OM, illformed_list) then return OM; end if;        -- get following punctuation token

        if not check(tok = "," or (tok = ";" and not have_gen_parms) or tok = ")", illformed_list) then return OM; end if;
        if tok = ";" then have_gen_parms := true; end if;

    end loop;

    if not check(another_tok(false) = ";", Illformed_list) then return OM; end if;            -- look for terminating semicolon

    return [name,mac_parms,mac_gparms];        -- return macro name, parameter names, and generated parameter names

end get_parm_list;

procedure get_arg_list(n);      -- gets specified number of arguments for macro

    if n = 0 then return []; end if;        -- no arguments to get
    
    arglist := [];            -- will collect
    current_arg := [];        -- initialize current argument as empty token sequence
    if another_tok(false) /= "(" then return OM; end if;        -- check for "(" which opens arglist
    
    while n > 0 loop          -- collect successive arguments

        if (tok := another_tok(true)) = OM then return OM; end if;            -- end of file reached

        if tok /= "," and not (tok = ")" and n = 1) then 
            current_arg with:= tok;        -- not delimiting comma or terminating ")", so collect
        else                -- have a delimiting comma

            arglist with:= current_arg; current_arg := [];        -- collect an argument and start a new ome
            n -:= 1;            -- decrement the number of remaining arguments

        end if;

    end loop;
    
    if tok /= ")" then return OM; end if;        -- check for ")" which terminates arglist
    
    return arglist;
     
end get_arg_list;

procedure get_macro_body;        -- collects sequence of tokens up to ENDM
        -- to allow multiple levels of macro nesting, this procedure must count nesting levels, 
        -- incrementing the level each time that the token "MACRO" is encountered, and decrementing 
        -- it each time "ENDM" occurs. Only an occurence of "ENDM" at the original 0 level ends the body. 
    
    mbody := [ ];            -- collected body
    macnum  := 0;            -- number of embedded macros
    
    loop        -- here we accept whitespace tokens
        tok := another_tok(true);

        if not check(tok /= OM, "MACRO BODY NOT PROPERLY ENDED: ") then return OM; end if;

        if tok = "MACRO" then macnum +:= 1; end if;
        if tok = "ENDM" then if (macnum -:= 1) = -1 then exit; end if; end if;
        mbody with:= tok;

    end loop;

    return mbody;

end get_macro_body;

procedure template(mac_body,mac_pars,mac_gpars);
    -- This procedure builds the "macro template" stored as the definition of a
    -- macro. The template consists of the string constituting the macro body,
    -- but with every parameter and generated parameter replaced by an integer.

     counter := 0;    -- start count at zero

     replacement := {[t,(counter +:=1)]: t in mac_pars + mac_gpars};
             -- This maps every macro parameter into its replacement integer

     return [replacement(t)?t: t in mac_body];

end template; 

procedure generated_parm;     -- auxiliary procedure to produce generated macro parameters.
    -- The macro parameters generated by this procedure have the form 'ZZZn', 
     -- where n is the string representation of an integer. ~ where n is the string representation of an integer. ~-

    return "ZZZ" + str(gmac_ctr +:= 1); 

end generated_parm;

procedure check(condition,msg);            -- error  checkutility
    return if not (condition) then err_msg(msg) else true end if;
end check;

procedure err_msg(message); print(message); return false; end err_msg;    -- error message utility

end macroprocessor;

The folloing small input file can be used to test the macroprocessor code given above.

	First some simple macros.
	MACRO aaa(xxx , fff;
	  uuu,vvv); xxx yyy uuu fff ffff vv vvv ENDM
	whatza.in aaa(a b c,d) bbb ccc
	MACRO aaa;ENDM
	aaa you me hoo
	Now for some nested macros!
	MACRO aaa; MACRO aaa(xxx , fff;
	  uuu,vvv); xxx yyy uuu fff ffff vv vvv ENDM MACRO uuu; MACRO vvv; www_www ENDM ENDM ENDM
	aaa
	aaa(BBB,CCC) uuu vvv

EXERCISES

1. A nondeterministic Turing machine is a Turing machine TM whose action mapping is not constrained to be single-valued. In addition, one particular internal state of each such machine must be designated as its "failed" state. Such machines can be regarded as describing indefinitely large families of computations which proceed in parallel. More specifically, we start with a given tape, tape position, and internal machine state, as in the case of an ordinary Turing machine. Then, whenever the internal state and the character under the machine's read head are such that action(character,state) is multivalued (consisting, say, of n values), we create as many logical copies of the machine as needed and assign one of them to take each of these n actions and continue the computation. This can generate a rapidly expanding set of computations, all proceeding in parallel. If a particular logical copy TM_j of TM reaches the special "failed" internal state, the particular path of computation which it is following ceases, and TM_j is simply deleted. As soon as any computation TM_k reaches an ordinary "stop" condition all other computations are deleted, and the result calculated by this successful logical copy TM_k of TM becomes the final result of the nondeterministic computation. On the other hand, if all computations TM_k reach the "failed" internal state, the nondeterministic Turing machine computation is considered to have failed. Modify the Turing machine simulation program shown in Section 9.6 so that it can simulate both ordinary and nondeterministic Turing machines.

2. A multitape Turing machine is one which has several separate tapes, with a read-write head on each, whose action on each cycle is determined by its internal state and by the characters found under all of its read-write heads. Modify the Turing machine simulation program shown in Section 9.6 so that it can simulate multitape Turing machines with any specified number of heads.

3. Can you think of any well-defined computing automaton or computational process whose activity could not be simulated by a SETL program? Review Exercises 1 and 2 before you answer.

4. The macroprocessor shown in Section 9.9 is programmed to regard every comma in a macro argument list as a separator. For example, if my mac is a macro name, then the invocation

          my_mac(f(x, y), z)

is considered to have three components, namely

         f(x      y)      z

This is not the best convention: it would be better to regard commas contained within parentheses or brackets as being invisible to the macroprocessor, so that the macro call shown would be regarded as having just two arguments f(x,y) and z. Modify the macroprocessor so that it behaves in this way.

5. (Continuation of Ex. 4). Especially if the modification suggested in Exercise 4 is made, use of a macroprocessor becomes subject to two dangers:

(a) If the parenthesis terminating an argument list is missing, much of the body of text following a macro invocation may be swallowed up in what appears to be a very long final argument.

(b) If the keyword ENDM ending a macro is missing or misspelled, the text following a macro definition may be swallowed up by the macro definition.

Modify the macroprocessor of Exercise 4 so as to limit each macro argument to 50 tokens and each macro definition to 200 tokens.

Exercises related to the "check processing" system of Section 9.5

6. Modify the check processing system so that it tracks

(a) The total dollar volume of transactions handled each day.

(b) The total dollar credit/debit that the bank using the system has built up against each of its correspondent banks. These quantities should be printed out as additional information by the DAY transaction.

7. Modify the check processing system, adding a new transaction DEL which prints out a list of all accounts for which more than a month has gone by without at least 10% of a customer's outstanding overdraft debit having been paid.

8. Modify the check processing system, adding the following two transactions:

(a) A transaction AB ('abuse') which shows all accounts for which an excess overdraft has accumulated or against which more than 10 ''insufficient funds" charges have been made during the current month.

(b) A transaction | ("idle") which shows all accounts against which no checks have been drawn during the past 6 months.

9. Modify the check processing system, adding transactions O and CL which allow new customer accounts to be opened and closed. Closing of accounts should be handled carefully: such accounts should be marked as having been closed but should not actually be deleted while there exist outstanding transactions, still to be returned by other banks, that might affect the account which is being closed. When an account is finally closed, the balance remaining in it should be used to pay off any outstanding overdraft_debit, and a check for the amount remaining in the account after this final payment should be prepared for mailing. How will you handle an account closing when the balance remaining is insufficient to pay off the overdraft debit?

10. Modify the check processing system so that it can add a short advertisement to the monthly statements being prepared for mailing to customers. The text of this advertisement should be supplied by a transaction of the form

advert n

where n is an integer, and where this line will be followed by n more lines giving the text of the advertisement. This transaction must be run just before the DAY transaction which triggers preparation of monthly statements.

11. If you have a checking account, save the next monthly statement you get from your bank, and scrutinize it carefully. How many of the features of this statement suggest that your bank is using a program similar to the check processing program shown in Section 9.5? What features reveal the use of processing steps that our simplified check processing system does not perform? If you can find any such feature, choose one of them and modify the check processing system to include it.

12. To what kinds of human error (e.g. misread or illegible checks, bad adresses entered for accounts, etc.) might the check processing system described in Section 9.5 be exposed? How would it neeed to be modified to make detection of such errors easier, and to allow them to be corrected manually?

13. The degree of compression attained by the Huffman coding procedure shown in Section 9.7 can be increased by using the fact that the probability of encountering a character depends on the character that has just been encountered. That is, we can calculate not one, but a whole family of Huffman trees, one for each high- probability character c in our alphabet; this tree should position other characters d according to the probability that d follows c.

Develop a modified Huffman package which uses these more refined probabilities, and also a modified Huff_tree code which calculates all the Huffman trees required.

14. Storing a Huffman tree requires memory space proportional to the size of the alphabet whose characters are attached to the terminal nodes of the tree. If the improved technique described in Exercise 13 is used, such a tree will have to be stored for each character in the alphabet, and the amount of space required for this can grow unpleasantly large (especially if the data compression procedure is to be reprogrammed for a small machine). In this case, the following expedient can be used to reduce the amount of storage required:

(a) For each character c, establish a limit L(c) which will bound the number of nodes used in the modified Huffman tree built from the frequency count developed for letters following c. This limit should be larger for commonly occurring characters c, smaller for infrequent characters.

(b) For each c, find the L(c) characters which most frequently follow c and "lump" all the other characters into a new character c'. The sum of the frequencies of all these "lumped" characters then becomes the frequency of c'.

(c) Build a Huffman tree for the alphabet of L(c) + 1 characters left after step (b). Then let the code of each character not "lumped" into c' be determined as in Exercise 13, but let the code of each character x "lumped" into c' be the concatenation of the normal Huffman code of c' with the standard internal SETL code of c.

Modify the Huffman encode/decode procedures to incorporate this space- saving refinement.

15. If the "Huff' and "Puff" procedures shown in Section 9.7 are really to be used for compressing large texts, we will want them to produce densely packed character strings rather than SETL-level sequences of zeroes and ones. To achieve this without having to abandon SETL in favor of a language in which sequences of bits can be manipulated directly, we can break the sequence of zeroes and ones that "Huff" would most naturally produce into 8-bit sections, each of which is then represented by a single SETL character. Conversely, when decoding, we can first convert each character in the string being decoded into a string of zeroes and ones. Modify the Huffman routines shown in Section 9.7 so that they work in this way. Your modified setup procedure should construct the extra data structures needed to convert characters into 8-bit sequences of zeroes and ones, and vice versa.

16. The decoding procedure shown in Section 9.7 and further described in Exercise 14 can be accelerated by keeping a map Decode which sends the start (say the first 8 bits) of the sequence s being decoded either into a pair [c, n], where c is the first character obtained by decoding s and n is the number of bits of s that represent this character, or into the node of the Huffman tree that is reached after walking down the tree in the manner determined by the first 8 bits of s, if these 8 bits do not lead us to a terminal node. Rewrite these routines by incorporating the suggested improvements.

17. The Huffman setup procedure shown in Section 9.7 can be made more efficient by saving the sequence of zeroes and ones describing the path from each Huffman tree node traversed. This information can be stored at the node. This makes it unnecessary for the setup procedure to traverse any edge of the Huffman tree more than once. Rewrite setup, incorporating this improvement.

18. The Huff_tree procedure shown in Section 9.7 can be made more efficient by using the treelike structures described in Section 9.7 to accelerate the aux- iliary get_min procedure. Rewrite Huff_tree and get_min, incorporating this improvement.

19. In playing a game, one may wish not only to win as much as possible, but also to win in the smallest possible number of moves. A recursion much like formula (1) of Section 9.8 can be used to determine the minimum number of steps which the winning player will need to bring the game to a successful conclusion. Find this recursion, and use it to develop a variant of the "alpha-beta" game-playing procedure which tells the winning player how to win as rapidly as possible and tells the losing player how to postpone his inevitable defeat as long as possible.

20. The "alpha-beta" game-playing program (see Est_A_can_win, Section 9.8) operates most efficiently if moves likely to return a large Est_A_can_win value are explored first. To guess in advance which moves these are likely to be, one can save the values calculated by Est_A_can_win during each cycle of play and use these values as estimates of move quality the next time it is the same player's turn to move. Write a variant of the Est_A_can_procedure which incorporates this improvement.

APPENDIX A

SETL Reserved Words

The words on the following page have a predefined meaning within a SETL program, and should only be used for their defined purpose.
and arb assert body case const continue domain else elseif

end exit find for forall from fromb frome if in

incs lambda less lessf loop max min mod not notin

npow null or otherwise package pow procedure program range rd

return rw sel stop subset then until use var when

while with wr class native cos sin tan acos asin

fix floor ceil float newat abs sign even odd sqrt log

exp str char unstr open close gets puts fsize nprint nprinta

print printa read reada reads binstr unbinstr get geta eof len

any notany span break match lpad rlen om atan tanh atan2

rany rnotany rspan rbreak rmatch rpad is_atom is_boolean

is_integer is_real is_string is_set is_map is_tuple is_procedure type

date time fexists system command_line abort abend_trap true false

APPENDIX B

Syntax Diagrams

Throughout this text, syntax diagrams are used to describe the grammatical structure of SETL constructs. For convenience, all syntax diagrams are collected in this appendix.

Each diagram describes the structure of a language construct. Each path through a given diagram traces one valid instance of the corresponding construct. The following conventions are used in drawing a syntax diagram:

(a) Syntactic classes are written in lowercase and enclosed in rectangular boxes.

(b) Terminal symbols of the language (delimiters and keywords) are in boldface and enclosed in rounded boxes.

(c) When the presence of a construct in a given diagram is optional (say the declarations in a program) then a path that bypasses the optional construct appears in the graph above that construct. For example, a procedure body includes the following:

(d) Repetition is indicated by a backward path that passes under the repeated construct. For example, a list of constants is a sequence of one or more constants, separat_d by commas. The corresponding syntax graph for the construct 'constant list' is the following:

and	arb	assert	body	case	const	continue	domain	else	elseif
end	exit	find	for	forall	from	fromb	frome	if	in
incs	lambda	less	lessf	loop	max	min	mod	not	notin
npow	null	or	otherwise	package	pow	procedure	program	range	rd
return	rw	sel	stop	subset	then	until	use	var	when
while	with	wr	class	native	cos	sin	tan	acos	asin
fix	floor	ceil	float	newat	abs	sign	even	odd	sqrt	log
exp	str	char	unstr	open	close	gets	puts	fsize	nprint	nprinta
print	printa	read	reada	reads	binstr	unbinstr	get	geta	eof	len
any	notany	span	break	match	lpad	rlen	om	atan	tanh	atan2
rany	rnotany	rspan	rbreak	rmatch	rpad	is_atom	is_boolean
is_integer	is_real	is_string	is_set	is_map	is_tuple	is_procedure	type
date	time	fexists	system	command_line	abort	abend_trap	true	false