Untitled

CHAPTER 8

Classes and Object-Oriented Programming in SETL.

8.1 Basic Concepts of Object-Oriented Programming

. Before presenting the details of the SETL object system, we will explain some of the general ideas and terminology of object oriented programming.

An object is defined by an internal collection of values representing its state and an associated collection of operations which can be used to sense and manipulate that state, and possibly to combine the object's internal state values with similar values drawn from other objects. Objects can be used to represent anything not directly or easily represented by one of SETL's built-in set-theoretic types. In particular, all of SETL's advanced facilities for user interaction, as presented in Chapter 10, are built using the object facilities explained in this section. These object facilities also serve to ease use of other external-language facilities availble to SETL by enclosing them in helpful 'syntactic wrappers'.

The operations defined on an object are called its methods, and are conceptually similar to procedures. But an important difference between a method and a procedure is that a method's name specifies an operation to be performed, but leaves the specific procedure to be used in performing the operation to be chosen dynamically, in a manner dependent on the type of the object to which the operation is to be applied. To illustrate the difference between these ideas, suppose we have two objects, one built around a bitmap and the other around an encapsulated Postscript graphic. Each of these objects might naturally have a method called draw, which draws it onto the screen (or elsewhere). We might then find ourselve working with a variable x that could be either a bitmap or Postscript graphic. Then the expression

x.draw( )

will call either the draw method defined on bitmaps or the method defined on Postscript graphics, depending on the value of x at the time the expression is evaluated. A method is therefore something like an overloaded procedure.

A method is invoked by passing it an object of a kind for which it was defined, along with any other arguments required by the method. This is much like a procedure call, except that the object both determines the procedure called and becomes an implicit argument to the call. The standard syntax for this is

x.method(parameter_1,..,parameter_k).

However, SETL also allows method calls to be written in infix notation, and, indeed, allows very aggressive overloading of all its built-in operations and much of its built-in syntax. The importance of this capablity is explained below.

A class or object class describes the data elements and methods common to a family of similar objects. Each of the (dynamically created, and possibly very numerous) objects described by a class is called an instance of the class. All these instances share a common set of methods (coded procedures), but each of these instances ordinarily contains its own private copy of some (typically of almost all) of the variables manipulated by these procedures. These variables, which exist in separate copies for each object of a class, are called the instance variables of the class; those shared by all the objects of the class are called the class variables of the class. Any of the instance variables, class variables, or methods defined for a class CL can be either be made public (visible to other programs, packages, or classes which use CL) or kept private (invisible to these other programs, packages, or classes.)

Classes can be related to underlying classes which aid in their definition by declaring a relationship called inheritance between them. This relationship integrates classes into an inheritance hierarchy. The inheriting class CL in such a relationship is called a subclass of any class from which it inherits, which conversely is called a superclass of CL. Subclasses are able to use data elements or methods in their superclasses as if they were declared internally. However, it is also possible for a subclass to override method and variable definitions made in one of its superclasses, simply by defining a method or variable with the same name again. Details of the conventions which apply are explained below.

8.2 Overall Structure of Class Declarations and Definitions

The rules governing the syntax of a class definition closely resemble those which apply to package definitions. A class is treated as a compilation unit, at the same level as a package or program. It is not possible to embed a class within a package, program or other class. As in the case of packages, the definition of a class consists of two parts, a class specification and a class body. (The specification and body need not occur in the same source file, but the specification must always be compiled before the body.) Each class specification declares the list of superclasses (if any) from which the class inherits, along with names of class-specific methods and data elements visible to units which use the class. The complete syntax of a class specification is:

	class  class_name;
	
			inherit  names_of_superclasses;  	-- there can be several of these
			data_declarations; 	   -- const and var declarations, as in packages and programs
			method_declarations;	-- like the procedure declarations in a package header

	end  class_name;		-- here, 'class_name' is optional

Each of these components of a class specification is described in a following section.

A class body defines some of the methods declared in the class specification, along with other methods and data elements visible only within the class. This can include a list of (class-private) instance and class variables. The syntax of a class body is:

	class body  class_name;

		use  names_of_other_classes_and_packages;  	-- there can be several of these
		data_declarations; 	   -- const and var declarations, as in packages and programs
		method_definitions;	-- like the procedure declarations in a package body
	
	end  class_name;		-- here, 'class_name' is optional

Note that ( for consistency) this syntax similar to that used for package specifications / package bodies; but there are a few differences:

any procedure in a class specification is treated and must be called as as a method .
inherit clauses appear within class specifications, not class bodies. Inheritance differs very significantly from use of another class or package, as explained in more detail in later.

8.3 Instance Variables and Class Variables

Data elements declared in connection with classes fall into one of several categories, based upon where they are stored and where they are visible. A data element declared in an object class or in one of the object classes from which it inherits can have a separate copy stored in each instance of the class, or it can be global to all instances of the class. This is the distinction between instance variables and class variables.

An instance variable is declared with the same syntax as is used for global variable declarations within packages or programs:

	var  name_1,name_2 := initializing_expression,...;

Note that all instance variables must be explicitly declared. Instance variables in the current instance (see 8.4) of any class C may be referenced by name only; and instance variables in other instances of the same or ancestor classes (these are the classes from which C inherits directly or indirectly) can also referenced using the more elaborate syntax illustrated by instance.name.

If initialization clauses are attached to the declarations of instance variables they are executed when an instance is created, but before any create method (see 8.5) is called.

Class variables are declared in a manner similar to that used for instance variables, but with the extra keyword class, as shown in

	class var  name_1,name_2 := initializing_expression,...;

If initialization clauses are attached to the declarations of class variables they are executed when a class is loaded by the SETL interpreter. This can happen at one of two times:

If a class is explicitly used by a program, the class is loaded at the start of execution.
the class can be loaded explicitly by applying the unbinstr procedure (see 6.2) or the getb procedure to one of its instances.

8.4 Methods

Methods are similar to procedures, except that many different classes can contain identically named methods. The standard syntax of a method definition is identical to that of a procedure definition, except that read-write and write-only parameters are not allowed.

	procedure name(parameters);
		data_declarations; 	   -- const and var declarations, as in packages and programs
		...
		statements
		...
		subprocedure_definitions;	-- like other imbedded procedure definitions 

	end name;				-- here, the  name is optional

Methods declared using this the standard syntax are invoked using the syntax

	 x.method_name(parameter_1,..,parameter_k).

Here x must be an instance of a class for which method_name is a method applicable to objects of that class. The method is invoked and the object x passed to it as an implicit parameter, referenced within the method using the reserved keyword self. Within any of an object's methods, any 'direct' reference to instance variables (i.e. any reference that is not preceded by an explicit object reference using the syntax instance.instance-variable) refers to the corresponding instance variable in the object x. Thus a direct reference

instance_variable to such a variable is synonymous with

self.instance_variable.

The more general syntax

instance.instance_variable

can be used to refer to the instance variables of other objects of the same class (or any of its superlasses), provided that these are visible at the point of call.

Similarly, the methods of a class may be called within a class body without the instance prefix., i.e. method_name(parameters) is synonymous with self.method_name(parameters) within such a class.

The B>self object passed to a method call as an implicit parameter is analogous to a read-write parameter of a procedure, in that the method can modify the object and pass the modified result back to the point of call; all other parameters of a method are always read-only. Internally, the SETL interpreter stores objects as tuples, where the first element of the tuple is a key indicating the class of the object and the remainder of the tuple stores the values of instance variables. When a method is called, the values of the instance variable are copied into the instance variables. When the method returns, those values are copied back into the tuple. Note that this protocol passes parameters by copying, not by reference.

8.5 Object Creation

An object is of a given class is created with a call to a create method that must be declared and defined within the class and its class body if any instances of the class are ever to be created. Such a method must appear in both the class specification and class body, since it must be visible outside the class body. (But if the class is used only as a superclass of other classes, no create method need be declared or defined for it.) Since create calls must specify the class to which the newly created object is to belong, they have a special syntax, which uses the name of the class as a surrogate creation routine name for that class. For example, if we have an object class rational_number (introduced to make rational numbers, which are not built into SETL, available), an object of that class might be created by writing

	three_fourths := rational_number(3,4);		-- create the fraction 3/4

As this example shows, a create method can accept parameters and use them to initialize the created instance. If this is done, the specified number of actual parameters must be attached to the creation call and will then be passed to create.

The number of actual parameters of the creation call must agree with the formal parameters in create. As part of the creation process, all of the initialization clauses attached to its instance variable declarations are executed.

Note that 'create' procedures need not return anything. This is because when create receives control, the object (self) to which it implitly refershas been created by the interpreter and the instance variables of this bject will already have been initialized (at least partially) by using to the initialization clauses on the instance variable declarations. The create procedure is invoked after this preliminary initialization, and has the responsibility of completing initialization (in complex cases this may require creation of various ancillary objects). When create terminates, the self is automatically returned; any other value returned by create will be ignored.

'Create' is not a reserved word. It is only special in that if a method named create is present in a class specification it will be called implicitly at the time an object is created by using the class name of the desired object.

8.6 Overloading SETL's standard syntax

SETL's built-in operators can be overlaoded, allowing them to invoke methods defined on classes. For example,we can define '+' and '*' operators for our hypothetical rational_number class, allowing us to write code like

	three_fourths := rational_number(3,4);	two_thirds := rational_number(2,3);	
	print(three_fourths * three_fourths + two_thirds * two_thirds);

Clearly the SETL interpreter will not know how to add or multiply two objects of some newly defined class unless an addition or multiplication method is explicitly defined for the class. To define a multiplication method for the class rational_number, something like the following the following could be placed in the class body:

	procedure self* x;				-- multiplication operator for rational numbers
		return rational_number(numerator * x.numerator,denominator * x.denominator);
	end;

We now detail the conventions that apply to such operator overloading.

8.6.1 Rules for overloading SETL's standard binary operators

To overload one of SETL's standard binary operators, for example '+' or max, we use the operator itself in a procedure header. Two overload methods can be defined for each of SETL's built-in binary operators. These have headers of the following forms:

	procedure self binary-op second_arg; 	or	procedure second_arg binary-op self;

The reson for allowing both of these two forms is that an object can appear either as the left or the right hand operand of a binary operator. SETL operations often expect both operands to be of the same type, but there are exceptions. For example, the built-in * operator can operate on integers and strings or integers and tuples, in which case the operands can appear in either order. We allow the two forms of binary operators to enable the same sort of thing for general objects. The first form above is used if the left operand determines the method used, in which case the left operand will become the current instance. Otherwise the second method will be used and the right operand will become the current instance.

In deciding how to process a binary operation the SETL interpreter gives precedence to the left operand. That is, it goes through the following steps before invoking a method of this kind:

If the left operand is an object and that object has an appropriate method, then that method is used and the left operand will become the current instance.
If the left operand is not an object or it doesn't have a left operand method but the right operand is an object with an appropriate method (defined in the second of the two forms shown above), then that method is used and the right operand will become the current instance.
Otherwise there is no appropriate method, so the program is aborted.

Each of these methods should return a value, although that is not enforced by the system. If no other value is returned, then OM will be returned. If an object of some class is to be returned the method will need to create, initialize, and then return a new instance.

Here is a complete list of SETL's standard binary operators.

	-    *    /    **    mod    min    max    with    less    lessf    npow

8.6.2 Rules for overloading SETL's standard unary operators

All of SETL's standard unary operators also allow associated methods to be defined. The headers for these methods have the following form

			procedure unary-op self;

In this case, the method used will always be determined by the class of the value of the operand, which will become the current instance. Each of these methods should return a value, otherwise OM will be used. If an object of some class is to be returned the method will need to create, initialize, and then return a new instance.

Here is a complete list of SETL's standard unary operators.

	-    #    arb    domain    range    pow

8.6.3 Rules for overloading SETL's standard relational operators

Many, but not all of SETL's relational operators can be overloaded. We do not allow overloading of the equality and inequality operators, since these primitive operations play a fundamental role in the definition of set and map membership. We also restrict the overloading of comparison operators, since the SETL code generator assumes that a < b if b > a. Therefore, we allow a < method but do not allow a > method. For each operation we invoke the < method, but in the expression a < b, a will become the current instance, while in a > b, b will become the current instance.

Because of these restrictions, only < and 'in' can have associated methods. The < method allows two forms (left and right) just like other binary operators and follows the same rules. The 'in' operator also has two forms but we give precedence to the right operator when determining the method to be invoke for an in expression.

Any method associated with < or 'in' in this way must return either true or false.

The < method will be called when any of the expressions: a < b, a <= b, a > b, or a >= b is encountered. The expression a <= b is interpreted as a < b or a = b.

8.6.4 Rules for overloading SETL's standard map, tuple, and string slice operations

SETL provides four expressions for refering to portions of maps, tuples or strings, namely f(x), f{x}, f(i..j) and f(i..). Each of these expressions can appear in both left and right hand side contexts. The ability to overload these syntactic constructs is particularly valuable, since it enables us to define new own aggregates organized any way we like, while retaining the ability to access components of those aggregates elegantly.

Each of these expression forms allows two associated methods, one for appearances of the form in left hand contexts and the other for right hand appearances. The syntax of the headers for these methods is as follows:

procedure self (id1);

procedure self(id1) := id2;

procedure self{id1};

procedure self{id1} := id2;

procedure self(id1..id2);

procedure self(id1..id2 ) := id3;

procedure self(id1..);

procedure self(id1..) := id2;

Each of the methods used in right hand side contexts (those whose headers do not include an := symbol) should return a value. Otherwise OM will be returned. The methods for left hand contexts should not return anything, but should use their rightmost argument to modify the current instance.

8.6.5 Rules for overloading SETL's standard extraction/deletion operations

SETL provides three extraction/deletion operations:

	from, fromb, and frome,

and object methods can be defined for any of these. The syntax of the corresponding method headers is:

procedure from self;

procedure fromb self;

procedure frome self;

Notice that there is no left operand in these headers, even though each is a binary operator. The left operand in a deletion operation is written but not read. Whatever value these methods return will be assigned to the left operand as well as the target operand. Each must return a value, or OM will be used.

8.6.6 Rules for overloading SETL's standard iteration constructs

SETL provides several syntactic constructs which call for iteration over an aggregate. For example, on encountering the expression

{x in S | x < 5}

the interpreter will iterate over S, screening each element by applying the condition x < 5 and collecting all values which satisfy that condition. Iterators are used in set and tuple forming expressions, 'for' loops, and quantified expressions. Three general forms of iterators are provided:

expression1 in expression2

expression1 = expression2 (expression3)

expression1 = expression2 { expression3}

Note that the expression y = f(x) is equivalent to [x,y] in f, and so does not enter into the following discussion.

SETL allows two pairs of built-in methods, corresponding to the first and third of these syntactic constructs. These are

procedure iterator_start; procedure iterator_next;

procedure set_iterator_start; procedure set_iterator_next;

When the interpreter encounters code requiring an iteration over an object, it calls the iterator_start or set_iterator_start method, depending on whether the iterator was of the first or third form above. Then it repeatedly calls iterator_next or set_iterator_next to get successive elements of the object.

The iterator_start and set_iterator_start methods need not return a value. They are only used to let an object initialize an iteration. The iterator_next and set_iterator_next methods should return the next element x in the object, including it in a tuple [x] of one component if the iteration is to continue, or should return OM if iteration is to terminate.

If the iterator form is y = f(x), then the first pair of iterator methods will be used, but each value retured must be a pair, so each return return statement within the method will look something like this:

return [ [x,y] ];

notice the double brackets. The outer tuple indicates that a value was found, and the inner tuple is the pair of values required in this form of iteration.

If the iterator expression is y = f{x} then the second pair of iterator methods will be used. The return values must obey the same rules as for y = f(x) iterators.

None of the method names appearing in this subsection are reserved words. If not used as iterator methods, they can have any number of parameters and return anything you like. But if they are to be used for iteration, they must conform to the rules above, or the program will abort.

8.6.7 Rules for overloading SETL's standard string conversion procedure

Objects are printed by first calling the built-in procedure str, then printing the string that results. A default string is alwaysproduced when str is applied to an object of a user-defined class, but this is mainly useful for debugging. (It lists all the instance variables, but in an ugly format.) This default string conversion can be overridden with something more elegant by furnishing a class with a method having the name 'selfstr', declared with the following header

procedure selfstr;

If this method is provided for a class it will be called when str is applied to objects of that class. 'selfstr' can return any value, but ideally should return a printable string version of the object.

8.7 Testing An Object's Type

The type of an object can be determined using SETL's built-in type procedure. The value returned will be the name of the object's class as an upper case character string. SETL is not case sensitive, but always keeps names as upper case.

8.7.1 SETL Objects Have Value Semantics

Suppose that we introcuce the following simple object class and then run the test program shown:

class simple;             -- simple demonstration object class

    procedure create(n);         -- creates object with given internal value
    procedure setval(n);         -- sets internal value
    procedure selfstr();        -- string conversion routine

end simple;

class body simple;         --  simple demonstration object class

    var val;            -- internal value
    
    procedure create(n); val := n; end create;        -- creates object with given internal value
    procedure setval(n); val := n; end setval;        -- sets internal value
    procedure selfstr(); return str(val); end selfstr;        -- string conversion routine
    
end simple;

program test;         -- 
    use simple;
    x := y := simple(1); x.setval(2); print(x," ",y);
end test;

The value printed by the test code is

2 1

This is because SETL objects introduced by class definitions, like built-in SETL objects, have strict value semantics: the operation x.setval(2), which changes the object x, must by definition have no effect on the object y, any more than it would have if we write

x := y := []; x with:= 2;

For this reason the SETL system creates a fresh copy of x before executing x.setval(2), preventing this operation or others like it from affecting y.

The same effect is seen in the test code

program test;         -- 
    use simple;
    x := {y := simple(1)}; y.setval(2); print(x," ",y);
end test;

which for the same reason produces the output

{1} 2.

This is not always the effect wanted (and one wants this effect less often in object settings than in dealing with orinary SETL values.) Suppose, for example, that our objects represnet the employees of a multi-department firm, whose departments are then represented by sets of employees, and that an operation like x.setval(n) is used to change an employee's salary when they get a raise. SETL's value-emeantics rule would then imply that this operation would have no effect on the summed value of salaries over the set of all employees in a department, sice the set of such employees would always contain objects with unchanged salary values. It is as if the operation x.setval(n) created an entirely new employee, with the new salary, while leaving the former employee unchanged in all contexts which the name 'x' does not directly identify. This strict value-semantics convention makes'salary' part of an employee's very identity, clearly not what is wanted. These are situations in which pointer semantics, not value semantics, should be employed.

To create SETL objects which display pointer rather than value semantics in regard to certain modifyable attributes, one can set the attribute items stored internally in the objects to unchanging atoms p, created by the creation routines ofthe objects themselves, and then attach the attribute values to these attribute atoms (which accordingly function as 'pointers') using the special global SETL map '^'. To modify our example class in this way we would write

class simple;             -- simple demonstation object class

    procedure create(n);         -- creates object with given internal value
    procedure setval(n);         -- sets internal value
    procedure selfstr();        -- string conversion routine

end simple;

class body simple;         --  simple demonstation object class

    var val;            -- internal value
    
    procedure create(n); val := newat(); ^val := n;  end create;        -- creates object with given internal value
    procedure setval(n); ^val := n; end setval;        -- sets internal value
    procedure selfstr(); return str(^val); end selfstr;        -- string conversion routine
    
end simple;

Now the output produced is

{2} 2,

showing that our modified objects have pointer semantics.

8.8 Methods As First Class Objects

A method can be used as a first class object just as a procedure can, but only when an implicit instance variable is first attached to it. The procedure-like value of a method which would be called as

x.method(parameter_1,..,parameter_k)

can be captured by writing

result := x.method;

Such a value can then be called in the same way as any other procedure, e.g. by witing

y := result(parameter_1,..,parameter_k).

This expression will invoke 'method' with the value of x as the current instance.

This semantic rule is consistent with that applied when SETL forms procedure closures. Recall from Section XXX that whenever a procedure value is returned that otherwise would reference some variable defined outside its own body, SETL saves the environment of that procedure, which is to say saves as much as necessary of the current activation of all enclosing procedures. Since, inside a class body. the current instance is part of a method's environment, we bind the current instance to the method and save the combination as a closure-like procedure value. It follows that object methods cannot be used as first-class objects until an associated object instance is bound with them.

8.9. A first example: calculation with rational numbers.

Suppose that we need to do an extensive calculation using rational numbers, which SETL does not provide as built-in objects. SETL's object class facility allows us not only to introduce objects of this kind, but to specify that the standard infix operators +, -, *, /, and also comparison operators like < and <= will be used in standard fashion to combine and compare rational numbers. Once this is done we can write arithmetic expressions for rational numbers in just the same way as we do for integers.

The following class definition accomplishes this. Since fractions are represented mathematically simply as pairs of integers, the way to set up this class is rather obvious. Each fraction (ie. object of 'rational' type) is defined by two internal instance variables num,den (its numerator and denominator, from which we always eliminate any common factor, so that they are always in 'lowest terms'.) The codes for the various arithmetic operators and comparisons seen in the class body simply reflect the standard mathematical rules for operating with fractions. All these operators are written so that they can combine either fractions with fractions, or fractions with integers, in either order. Whenever a new fraction is formed, we use the internal 'simplify' routine to reduce it to lowest terms. The 'str' procedure is redefined so as to give rational numbers their standard printed form 'num/den'. Since division of rationals by integers is available, the create routine simply takes a single integer numerator as parameter, and we build fractions like 2/3 by writing 'rational(2)/3'.

class rational;                                 -- rational number class
        
     procedure create(n);                    -- converts the integer n to the corresponding fraction n/1
     procedure ceil();                               -- ceiling of rational
     procedure floor();                              -- floor of rational
        
end rational;
        
class body rational;
        
        var num,den := 1;               -- numerator and denominator, always in lowest terms
        
        procedure create(n);    -- converts the integer n to the corresponding fraction n/1
                num := n;       -- set the numerator
        end create;
        
        procedure set_denom(n); den := n; end set_denom;
        
        procedure selfstr;              -- string conversion for fractions
                return str(num) + "/" + str(den);
        end selfstr;
        
        procedure self + r2;    -- addition routine for fractions, and for fraction + integer
        
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := num * den2 +  num2 * den;
                n2 := den * den2;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure r2 + self;    -- addition routine covering the case integer + fraction
                
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := num * den2 +  num2 * den;
                n2 := den * den2;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure self - r2;    -- subtraction routine for fractions, and for fraction - integer
        
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := num * den2 -  num2 * den;
                n2 := den * den2;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure r2 - self;    -- subtraction routine covering the case integer - fraction
        
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := num2 * den - num * den2;
                n2 := den * den2;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure self * r2;  -- multiplication routine for fractions, and for fraction * integer
        
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := num * num2;                       -- we simply multiply the numerators and the denominators separately
                n2 := den * den2;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure r2 * self;  -- multiplication  routine covering the case integer * fraction
        
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := num * num2;                       -- we simply multiply the numerators and the denominators separately
                n2 := den * den2;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure self/ r2;    -- division  routine 
        
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := den2 * num;     -- we simply invert and then multiply the numerators and the denominators separately
                n2 := num2 * den;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure r2 / self;    -- division  routine covering the case integer / fraction
        
                if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
                n1 := den * num2;           -- we simply invert and then multiply the numerators and the denominators separately
                n2 := num * den2;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure self ** n;            -- integer power routine
        
                n1 := num ** n;         -- we simply take corrresponding powers of the numerator and the denominator
                n2 := den ** n;
        
                new_rational := rational(n1); new_rational.set_denom(n2); 
                return new_rational.simplify();
        
        end; 
        
        procedure self < r2;           -- rational comparison
                
            if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
            return (num * den2 -  num2 * den) < 0;
                        
        end; 
                
        procedure r2 < self;    -- rational comparison covering the case integer <= fraction
                
            if is_integer(r2) then num2 := r2; den2 := 1; 
                        else num2 := r2.num; den2 := r2.den; end if;
            return (num2 * den -  num * den2) < 0;
                        
        end; 
                 
        procedure ceil();       -- ceiling of rational
           quot := num/den; 
           return if (rem := num -  quot * den) <= 0 then quot else quot + 1 end if;
        end ceil;
                 
        procedure floor();      -- floor of rational
           quot := num/den; 
           return if (rem := num -  quot * den) >= 0 then quot else quot - 1 end if;
        end floor;
               
        procedure simplify();   -- reduces a fraction to lowest terms
        
                if den = 0 then
                        abort("RATIONAL ARITHMETIC ERROR: division by zero " + str(num) + "/" + str(den)); 
                end if;
        
                comden := gcd(num,den);         -- find common denominator
                if (den := den/comden) < 0 then num := -num/comden; den := -den;  else num := num/comden; end   if;
        
        return self;
        
        end simplify;
        
        procedure gcd(n1,n2);   -- creates common denominator by Euclid's algorithm
        
                m1 := abs(n1); m2 := abs(n2);
                if m1 < m2 then [m1,m2] := [m2,m1]; end if;
        
                while m2 /= 0 loop
                        m1 := m1 mod m2; 
                        [m1,m2] := [m2,m1];
                end loop;
        
                return m1;

        end gcd;

end rational;

The following short program tests the 'rational' class defined by the preceding code.

	program test;		-- test of rational class
		use rational;
			
		two_thirds := rational(2)/3; 
		print(two_thirds - two_thirds = rational(0));	-- test object equality routine 
		
		print(y := two_thirds ** 4,"\n",two_thirds ** 1000);
		
		print("\n",y + (one_half := rational(1)/2)," ",(y/two_thirds - one_half) * one_half);
		                
		print("\n",fraction_set := {two_thirds ** n: n in [0,10]}," ", +/fraction_set);
		                
		print("\n",two_thirds.ceil()," ",two_thirds.floor());
		print("\n",minus_1_third := two_thirds - 1); 
		print("\n",minus_1_third.ceil()," ",minus_1_third.floor()); 
		                
		print("\n",two_thirds > (xx := rational(65)/100)," ",two_thirds <= xx);
	
	end test;

The fraction two_thirds ** 1000 turns out to be

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468 25187145285692314043598457757469857480393456777482423098542107460506237114187795418215304647498358194126739876755916554394 6077062914571196477686542167660429831652624386837205668069376 /
13220708194808066368904552597521443659654220327521481676649203682268285973467048995407783138506080619639097776968725823559 50954582100618911865342725257953674027620225198320803878014774228964841274390400117588618041128947815623094438061566173054 08667449050617812548034440554705439703889581746536825491613622083026856377858229022841639830788789691855640408489893760937 3242171846359938695516765018940588109060426089671438864102814350385648747165832010614366132173102768902855220001

a fact you may wish to remember.

8.10.1 Advantages of Object-oriented Programming; Inheritance.

Object-oriented programming facilities like those of SETL embody a deep complex of ideas, which like all deep ideas present us with many possibilities, all deserving careful consideration. In the first place, these facilities invite us to view the first stages of any programming project, particularly large and challenging projects, as a matter of designing one or more families of objects, and of operations which allow these objects to interact with and act upon each other, often in ways which generate new objects of the same type. This approach is seen at its most successful when we use mathematical objects, like vectors, matrices, polynomials, trees, graphs etc., whose definitions and associated operations have been polished by years of careful study and use.

Multimedia application developments can also make good use of objects which encapsulate fragments of behavior useful for the construction of larger objects and of full interactive projects. (This is the way in which the SETL interactive environment described in Chapter 10 has been organized.) Objects useful in such settings include graphics which change their appearance or execute associated code fragments when approached or clicked, sliders and knobs which serve as sources of numbers, dials and bars which can display these numbers in various useful forms, collections of small dots which can be dragged to adjust the positions of larger and more complex objects, etc. This idea then extends itself naturally to the construction of second-level objects designed to ease management of large collections of first-level objects: folders in which other objects can be stored, windows in which they can be viewed in various ways, and so forth. In an object-oriented one first specifies the nature and behavior of the classses of objects (often called 'controls', 'widgets', or even 'mega-widgets') that will be used to build an interactive appication, then realizes them using the facilities of the language in which one is working, and finally uses the resulting library ofobjects to build the full application desired. This level-by-level approach decomposes complex design problems in useful ways and can often help one to elaborate nicely modular, logically compelling designs. Several examples illustrating this idea appear in Chapter 10.

Large programs often develop by progressive transformation and elaboration of simpler initial versions, and here also SETL's object facilities provide us with very useful tools. Suppose, for example, that we aim to develop some kind of large database, which eventually will have to serve as repository for many millions of records. In a tiny, very first prototype, this may simply consist of a family of SETL mappings which collectively store whatever information the database is to hold, for some test collection of a few hundred records. These initial mappings can be saved to disk using the standard binary read/write pair putb and getb, giving us an immediately functional mini-database in which all database capabilities not involving size or efficiency considerations can be designed and implemented. Then, at such time in development at which it becomes appropriate, we can use SETL's object facilities in the following way to scale up the database:

Introduce one or more classes of objects which have the same external interfaces as the SETL mappings originally used to create the database. That is, objects of these classes must support all the operations f(x), f(x) := y; f with x, f + g, etc. applied to the corresponding map objects in the original database prototype, and these operations must behave in the same way. That is, after an operation f(x) := y the expression f(x) must yield the value y, etc. This is to say that our new objects must behave like maps in every way visible to our original database prototype. However, their internal structure and implememtation may be entirely different, and they may support additional operations allowing important extensions of the prototype database functionality.
The design of the new objects should ensure that the reqired level of efficiency will be attained when the database is scaled up. For example, the map-like objects introduced may be implemented in two parts, a main part held on disk (to allow scaleup to very large size) and a smaller dynamically varying part, managed automatically in response to the operations applied to the map objects comprising the database. The part of the database held on disk can consist partly of various auxiliary indices which serve to locate objects rapidly when these are called for. The structures used may also be designed to maintain key parts of their history of change (for debugging or for post-auditing), to support efficient reversion (making it possible to back out of partially completed transactions), to guarantee reconstructabliity after system crash, etc. Parts of the database can be left resident on some 'home' computer, while other parts are moved to remote servers accessed via Internet sockets. All of this internal reworking can leave the original map-like external syntax and semantics of the evolving database objects unchanged, except that new operations, supporting new capabilities, can be added. This will ensure that
- The original database prototype always remains functional as it is scaled up.
- new versions of the database can be checked by comparing their contents with the contents of prior versions, after equivalent manipulation sequences.
To attain still higher efficiency, particularly critical operations can be translated into native C code, implemented by one or more native packages of the kind described extensively in Chapter 9, which the object classes used encapsulate in interfaces identical to those of the less efficent SETL operations which they replace.

The way in which objects can use overlaoded syntax serves to reveal and emphasize analogies among them which can ease both their design and their use. A section of code will act in the intended way whenever the objects appearing in it support the operations applied in the code, provided that these operations satisfy the logical assumptions concerning them implicit in the code. The object-oriented programming approach exploits this fact by eliminating unnecessary syntactic variation, thereby allowing direct code re-use in situations where extensive rewriting might otherwise be necessary. At an even more primitive level the appearance of an operatior sign like '+' hints that the operator being applied is some sort of constructive 'union' which might be associative (i.e. satisfy (a + b) + c = a + (b +c), and so allow extended 'sums' to be formed recursively), the appearance of the sign '*' hints that some notion of commonality or repetition may be involved, the appearance of y := f(x) hints that an operation f(x):=y may also exist, etc.

Operator overloading also helps us avoid unnecessary syntactic proliferation, which if unchecked can rapidly generate large, hard-to-remember programming vocabularies. Suppose, for example, that we will be working with some class of graphic objects whose appearance is determined by a long list of graphical attributes, e.g. color, size, position, orientation, size and orientation of various subfeatures, etc. It will sometimes be convenient think of these objects as a kind of extended map from attribute names to attribute values, with values which can be set by assignments having forms like

obj("color") := "red"; obj("position") := "top"; obj("orientation") := "horizontal";

but which change the graphical appearance of the objects to which they are applied. Writing things in this way cuts our programming vocabulary in half and makes it easier to remember, since we can write

obj("color") := "red"; and x := obj("color");

instead of something like

obj.setcolor("red"); and x := obj.getcolor();

which is distinctly less predictable, since it might for example be

obj.set_color("red"); and x := obj.get_color();

or even

obj.set_color("red"); and x := obj.color();

A final advantage of the object-oriented prorammming approach is that it safely packages groups of related updates whose seperation would allow inadvertent, bug-generating omisssions or failures of correspondence. Suppose, for example,that we are working with a collection of tuple-like objects but using them as serial files, from which successive objects are read by an operation that might be written as

f.next()

Instead of packaging these as objects one might choose to work directly with their underlying representation, which might, e.g, consist of a tuple t and a tuple index ix, thus simply converting

x := f.next() to x := t(ix +:= 1);

which seems safe enough. But if our code is working with several such objects f_1, f_2,... and occasionally needs to exchange them, the very direct

[f_1,f_2] := [f_2,f_1];

of a piously object-orienbteed version becomes

[t_1,t_2] := [t_2,t_1]; [ix_1,ix_2] := [ix_2,ix_1];

which is distictly harder to get straight since

it requires us to remember and correctly type four names instead of two.
the human psychological tendency to 'take the part for the whole', a common source of bugs, makes it easy to omit the second assignment [ix_1,ix_2] := [ix_2,ix_1];

If this ever happens it may introduce an elusive bug ,whose initial symptom may be the improperly repeated appearance or the disapperance of 'file' elements, but only after the 'files' have been exchanged. If this bug appears as part of a complex code, it can easily take several hours to track it down. In the object-oriented form of the same code this bug could never have occurred at all.

8.10.2 More about Inheritance

Object classes OC built using the SETL facilities can either use other classes or inherit from them. use is most appropriate when objects of type OC are built from other kinds of objects employed as parts, as, for example, a table might be built from four legs and a top. Inheritance is most appropriate when objects of one type constitute a kind of objects of some other type, possibly with some added or revised features. For example, a folding table is a kind of table, while a house-trailer is a kind of house and a kind of truck (in cases like this, inheritance from multiple ancestral classes may be appropriate.)

The inheritance relationship among classes resembles the use relationship among packages, but is stronger. When a package P uses a package P', only those procedures and variables of P' which have been made public by being declared in the package specifier of P' (rather than in its body) are visible within P. But when a class C' is inherited, by a class C, all its instance variables and methods become visible in C, and beyond this become a part of C, rather as if the C' were textually inserted into C. (The difference between inheritance and textual inclusion is that multiply defined inherited methods are hidden rather than generating errors. Moreover such hidden methods can be accessed by prefixing the name of class from which they originate to the method name.)

The syntax of an inherit clause is:

(diagram here)

Inherit clauses referencing a class C' are placed in the class specification of the classes C which are to inherit from C', after the header line of C and before any other declarations. Each of the identifiers appearing in an inherit clause must identify some class already available in the SETL library. All variables and methods of each superclass C' from which C inherits will be brought into C, unless this is prevented by thee redefinition of someone of these methods within the body of C, or by conflicts among names of variables and/or methods belonging to the classes from which C inherits. The rules for resolving conflicts are as folllows:

No variable names may be redefined.
Method names follow similar rules to packages: A local definition overrides an inherited one. If there are conflicts among inherited names, all are hidden. Hidden names are accessible with the syntax: superclass.method-name. Hidden names deriving from names in superclasses are only accessible in this way within the bodies of classes inheriting from them.

Inheritance also makes names available transitively, whic use does not. Suppose for example that a, b, and c are classes, and that a uses b and b uses c. Then nothing from c is visible in a (unless of course a also uses c). Things are different with inheritance, which is similar to textual insertion and in particular is recursive. If a inherits b and b inherits c then everything in c not blocked by a redefinition in a or b is visible in a. Even c's otherwise hidden methods in c are accessible in a, albeit by use of the expression c.method-name.

An example. The following small example illustrates the use of inheritance. It reflects that fact that for many kinds of algebraic objects for which addition and subtraction operations are both defined. and for which there is a 'zero' object (fractions, matrices, and polynomials are all examples), subtraction can be expressed in terms of addition and the monadic minus operation by use of the familiar identity

a - b = a + (-b).

Also addition is commutative and thus satisfies the identity

a + b = b + a.

The code shown below introduces the (relatively abstract) class of all objects for which these identities are avalable, under the name 'algebraic'. For objects of this class, it defines a - b as a + (-b) (which assumes that a summation operator and a mondic minus operator will be made available), and also defines addition with an algebraic object as second rather than first argument by using the identity a + b = b + a to revese the argument order. Then by declaring the class int_obj to inherit algebraic, these same extensions become available for int_objs. Likeise by declaring the class matrix to inherit algebraic, we make these extensions available for matrices

The first test class of objects to which this trick is applied are simply the integers, wrapped as objects. This is of course a useless family of objects, which we introduce simply to illustrate the techniques involved.

class algebraic;            -- 'algebraic object' - a relatively 'abstract' class
        -- no objects of this class  will ever be created directly, so no 'create' routine is needed
end algebraic;

class body algebraic;

    procedure self - b; return self+ (-b); end;    -- this assumes that an addition routine is avalable
    procedure b + self; return self+ b; end;       -- this assumes that an addition routine is available
    
end algebraic;

class int_obj;                -- integers, as  a sample class of algebraic object
    inherit algebraic;        -- we inherit from the more general 'algebraic' class
    procedure create(n);
end int_obj;

class body int_obj;           -- integers, as  a sample class of algebraic object
    var val;
    procedure create(n); val := n; end create;
    procedure selfstr; return str(val); end selfstr;

    procedure self+ b; return int_obj(val + if is_integer(b) then b else b.val end if); end; 
    procedure -self; return int_obj(-val); end; 
    
end int_obj;

class  matrix;                -- matrices, as a second sample class of algebraic object
    inherit algebraic;        -- we inherit from the more general 'algebraic' class
    procedure create(m);
end matrix;


class body matrix;           -- matrices, as a second sample class of algebraic object
    var val;
    procedure create(m); val := m; end create;
    procedure selfstr; return "" +/ ["|" +/ [pad_8(c): c in row] + "|\n": row in val]; end selfstr;
    procedure pad_8(c); stg := 8 * " " + str(c); return stg(#stg - 7..); end pad_8;    -- pad value to  8 characters 

    procedure self+ b; return matrix([[c + b.val(i)(j): c = row(j)]: row = val(i)]); end;   -- matrix addition
    procedure -self; return matrix([[-c: c = row(j)]: row = val(i)]); end;                  -- negative matrix
    
end matrix;

The following mini-program can be used to test the preceding code.

program test;
	use int_obj,matrix;
    print(30 + int_obj(333)); print(int_obj(30) - int_obj(333)); print(int_obj(30) - 333);
	print(m:= matrix([[1,2],[3,4]])); print(-m); print(m2 := m  + m); print(m2 - m);
end  test;

It produces the output

	
			363
			-303
			|       1       2|
			|       3       4|
			
			|      -1      -2|
			|      -3      -4|
			
			|       2       4|
			|       6       8|
			
			|       1       2|
			|       3       4|

'Virtual' Methods. Another, almost equivalent but often preferred way of writing the 'algebraic' class that appears in the preceding example is

class algebraic;            -- 'algebraic object' - a relatively 'abstract' class
        -- no objects of this class  will ever be created directly
end algebraic;

class body algebraic;

    procedure self - b; return self+ (-b); end;        -- this assumes that an addition routine is available
    procedure b + self; return self+ b; end;           -- this assumes that an addition routine is available
    procedure self+ b; end;                                   -- this empty method is 'virtual'
    procedure -self; end;                                     -- this empty method is 'virtual'
    
end algebraic;

Here we have simply made explicit that fact that classes which inherit from 'algebraic' are expcted to provide the methods self + b and -self. This is done by including the (otherwise useless) empty procedures self + b and -self in the body of the class algebraic. Empty (or simple default) procedures included in this way merely to announce that they will subsequently be over-written are often called virtual methods. They give the notion of inheritance what might be called a 'Japanese' as contrasted with the more familiar 'American' flavor: in the U.S., you can inherit your parent's wealth, but are not required to assume your parent's debts; in Japan you can traditionally inherit parental debts which you are required to pay. Virtual methods are like debts which every class inheriting from a given superclass is required to pay.

Next we give a third version of this same example, which is contrived to maximize separation of the several strands of thought and technique which enter into into it:

The 'algebraic' class is expanded, to provide the variable 'val' storing an object's value, and it also assumes that each of its subobjects will initialize four additional procedure-valued variables, called 'sum', 'invert', 'self_str', and 'maker'. It uses these to define all the operations a + b, -a, and a - b that it provides and to convert objects to strings. This class assumes that 'sum', 'invert', and 'maker' will be made available to all the operations on values that it requires, but makes no assumption converning the actual form of these objects. 'algebraic' is therefore a pure syntactic wrapper class.
Versions of the 'sum', 'invert', and 'self_str' routines appropriate for integer and for matrix objects are supplied by two packages called 'intops' and 'matrix_ops'. These routines, which do the innermost algebraic work for integer and matrix objects respectively, use the internal representations of these objects (which they define), and so do not reflect the fact that they will be called in an object-oriented setting. The are pure semantic packages, solely responsible for defining the form of the objects being manipulated and the way in which operations on them are implemented.
The two small classes 'int_obj' and 'matrix', each of which is just a few lines long, have sterotyped forms and so could be generated automatically. They serve to link their class names to the packages which do their semantic work and, for the rest, inherit all their operations from the 'algebraic' class.

This third version of our code therefore breaks the programming needed to introduce the desired operations into three cleanly separated parts: (a) a semantic wrapper class, (b) packages of procedures to do the real work, and (c) a few auxiliary mini-classes to link (a) to (b).

class algebraic;            -- 'algebraic object' - a relatively 'abstract' class, encapsulating  syntax
        -- no objects of this class  will ever be created directly
end algebraic;

class body algebraic;
    var val,sum,invert,maker,self_str;                                        -- must be avaialable
    
    procedure self - b; return maker(sum(val,invert(if is_integer(b) then b else  b.val end if))); end;
		        -- difference
    procedure b + self; return maker(sum(if is_integer(b) then b else  b.val end if,val)); end;             
				-- sum in alternate order
    procedure self + b; return maker(sum(val,if is_integer(b) then b else b.val end if)); end;                
				-- sum
    procedure -self; return maker(invert(val)); end;
                 -- mondic minus
    procedure selfstr; return self_str(val); end selfstr;
                 -- print form
    
end algebraic;

package intops;                -- integer operations semantic package; does the innermost algebraic work

    procedure sum_op(val,b);             -- addition 
    procedure invert_op(val);            -- unary minus
    procedure selfstr_op(val);           -- string conversion

end intops;

package body intops;        -- integer operations semantic package; does the innermost algebraic work

    procedure sum_op(val,b); return val + b; end sum_op; 
    procedure invert_op(val); return -val; end invert_op; 
    procedure selfstr_op(val); return str(val); end selfstr_op; 

end intops;

package matrix_ops;                -- matrix operations semantic package; does the innermost algebraic work

    procedure sum_op(val,b);             -- addition 
    procedure invert_op(val);            -- unary minus
    procedure selfstr_op(val);           -- string conversion

end matrix_ops;

package body matrix_ops;        -- matrix operations semantic package; does the innermost algebraic work

    procedure sum_op(val,b); return [[c + b(i)(j): c = row(j)]: row = val(i)]; end sum_op; 
    procedure invert_op(val); return [[-c: c = row(j)]: row = val(i)]; end invert_op; 

    procedure selfstr_op(val); return "" +/ ["|" +/ [pad_8(c): c in row] + "|\n": row in val]; end selfstr_op; 
    procedure pad_8(c); stg := 8 * " " + str(c); return stg(#stg - 7..); end pad_8;    -- pad value to  8 characters     

end matrix_ops;

class  int_obj;                -- integers, as  a sample class of algebraic object
    inherit algebraic;        -- we inherit from the more general 'algebraic' class
    procedure create(n);
end int_obj;

class body int_obj;            -- wrapper class, combining syntax with  semantics
    use intops;                -- use  package of integer operations

    procedure create(n); val := n; sum := sum_op; invert := invert_op; maker := make_op; self_str := selfstr_op; end create;
    procedure make_op(n); return int_obj(n); end make_op;
    
end int_obj;

class  matrix;                -- matricess, as a second sample class of algebraic object
    inherit algebraic;        -- we inherit from the more general 'algebraic' class
    procedure create(m);
end matrix;

class body matrix;
    use matrix_ops;            -- use  package of matrix operations

    procedure create(m); val := m; sum := sum_op; invert := invert_op; maker := make_op; self_str := selfstr_op; end create;
    procedure make_op(m); return matrix(m); end make_op;

end matrix;

Inheritance is of course most interesting when wealth rather than debt can be inherited. In the SETL context, this corresponds to situations in which a superclass can provide methods which some inheriting subclass can use without change. Unfortunately this happy situation is relatively rare. Ordinarily when we change an object's internal representation, the code for the operations applied to it will need to undergo at least some small modifications, and so cannot simply be inerited. Even in the 'house-trailer'-like case in which it is a natural to think of a class as a composite of two pre-existing classes ('house' and 'truck'), the operations applying to either aspect of the composite will generally need to refect the presence of its other portion in some way, making use, which allows this, more appropriate than inheritance, which does not. This is all the more true when an object multiple copies of a pre-existing object must enter into a composite object, as in a table with four legs.

The kind of situtation which the preceding example illustrates typifies cases in which SETL finds inheritance most valuable. These are cases in which higher level operations can be defined in some unchanging functional manner from constants and procedures which a subclass must supply, just as each of the subclasses 'matrix' and 'int_obj' supplies the sum, invert, maker, and self_str operations used by 'algebraic' to define its infix and prefix algebraic operations. Where even this level of invariance does not hold, method codes need to be rewitten in any case, so inheritance is less useful. However, it is possible (though a bit unusual) for a class C to both use and inherit the same class C'; this can be done to give C direct access to the interrnals of C', rather than requiring C to access these by using the methods of C'.

It is worth noting, however, that the mechanism of inheritance is considerably more important in certain languages other than SETL, for example C++ and Java. This is because these are statically typedlanguages, rather than being untyped or dynamically typed, as SETL is. This allows code written in these languages to be compiled more agressively into machine-like forms than is possible for SETL, which needs to check the type of each object with which it deals to determine what the object is, how to apply a specified operation to the object, and whether the operation is applicable at all. Statically typed languages aim to make it syntactically impossible for operations to be applied to objects which do not support them. Thus, in these languages, if we want to allow some of the operations defined for one kind of object to be applied to objects of some loosely related kind, it may be easiest to have them both inherit from some abstract class which supports all the methods of interest for either kind of object, at least nominally. This encourages the introduction of abstract classes C, full of empty virtual methods meant simply to be redefined in every subclass of C which really needs to use these methods. SETL, which does without the syntactic restrictions which this kind of artifice evades, has less use for inheritance. (But such use of inheritance may have documentation value even for SETL.)

8.11 A 'Coordinated Iteration' Object Class

The processing pattern seen in the 'merge' subroutine of 'merge_sort', namely coordinated iteration over two or more sequences of objects sorted into corresponding order, is common enough for an object class which encapsulates it to be of some interest. The code below realizes such an object class, which the code names 'coord_iter'. Objects of this class are created from a tuple of other objects, each of which must at least support a notion of iteration over its elements by presenting two methods, essentially the 'iterator_start' and 'iterator_next' methods described in Section 8.6.6, for external use. However, since the names 'iterator_start' and 'iterator_next' are treated in a special way by the SETL system, the following code assumes that these methods are made available underthe names 'istart' and 'inext'. To adapt a pre-existing class to this requirement, one merely needs to add the following two lines to its body, and the two corresponding procedure declarations to its class header.

    procedure istart; iterator_start; end istart;    
    procedure inext; return iterator_next(); end inext;

Objects of type 'coord_iter' are created by calls

iter_obj := coord_iter(obj_tup,key_tup);

And then used simply by writing iterators over them in the ordinary SETL form, e.g.

for x in iter_obj loop... or [e(x): x in iter_obj ...]

However, such an iteration returns, not merely the values x selected (in increasing order) from the various obj_tup(j) involved in the coordinated iteration, but pairs [j,x] containing both the value x and the index j of the source object from which x was obtained. If only the values x are wanted, the first components j of these pairs must be dropped.

The first input 'obj_tup' of a 'coord_iter' creation call is the tuple of objects to be involved in the coordinated iteration, as explained above. The second input 'key_tup' is a tuple of mappings, where for each j key_tup(j) is assumed to send the successive elements of obj_tup(j) into values directly comparable by the SETL comparison operator '<='. Any component of key_tup can be replaced by OM, in which case the identiy map will be used in its place. We also allow any of the components obj_tup(j) to be a tuple rather than an object, in which case the iterator for it is assumed to be standard tuple iteration.

The code seen below manages coordinated iteration by maintaining a tuple 'current_vals' whose j-th component current_vals(j) is the most recent value obtained by applying an iteration step to the object obj_tup(j). In each cycle of coordinated iteration, the minimum component of current_vals is selected for return, and replaced by the next item drawn from obj_tup(j). The iteration-step operators neeed for this are held in an auxiliary tuple 'next_fcns'; when iteration over obj_tup(j) reaches its end, next_fcns(j) is set to OM, causing current_vals(j) to drop out of contention for the minimum to be returned. If obj_tup(j) is a tuple rather than some other kind of object, next_fcns(j) stores an iteration index, rather than an iteration-step function.

class useiters;            -- iteration test
    procedure istart;                -- wrapper for iterator_start
    procedure inext;                -- wrapper for iterator_next
end useiters;

class body useiters;            -- iteration test
    procedure istart; end istart;    -- wrapper for iterator_start
    procedure inext; end inext;        -- wrapper for iterator_next
end useiters;

class coord_iter;            -- coordinated iteration class
    procedure create(obj_tup,key_tup);            -- creator. parameters are list of objects and list of comparable keys 
end coord_iter;

class body coord_iter;        -- coordinated iteration class
    use useiters;
    var objs,keymaps,current_vals,next_fcns;        -- next_fcns(j) is set to OM when objs(j) is exhauasted
    
    procedure create(obj_tup,key_tup);            -- creator. parameters are list of objects and list of comparable keys 
        objs := obj_tup; keymaps := key_tup?[]; 
        for j in [1..lot := #obj_tup] loop  keymaps(j) ?:= ident; end loop;
        current_vals := [newat(): j in [1..lot]]; next_fcns := [newat(): j in [1..lot]]; 
    end create;
    
    procedure ident(x);    return x; end ident;        -- identity mapping

    procedure iterator_start;            -- begin iteration
        for obj = objs(j) loop
            if is_tuple(obj) then            -- object is a tuple; start with first component. ^next_fcns(j) is iteration index
                ^(next_fcns(j)) := if #obj = 0 then OM else 1 end if; 
                ^(current_vals(j)) := obj(1);
            else            -- object is not a tuple; start with first
                obj.istart(); 
                ^(next_fcns(j)) := if (val := (oin := obj.inext)()) = OM then OM else oin end if;
                ^(current_vals(j)) := if val = OM then OM else val(1) end if;
            end if;
        end loop;
    end iterator_start;                

    procedure iterator_next;            -- iteration cycle. Returns smallest item among non_exhausted, in pair

        ix_smallest := smallest_val := OM;
        was := false;
        
        for obj = objs(j) | ^(next_fcns(j)) /= OM loop            -- search for element to return
            was := true;
            if ix_smallest = OM then             -- first element
                smallest_key := keymaps(j)(smallest_val := ^(current_vals(ix_smallest := j)));
            elseif (possib_newkey := keymaps(j)(maybetter := ^(current_vals(j)))) < smallest_key then
                smallest_key := possib_newkey; 
                smallest_val := maybetter; ix_smallest := j;
            end if;
        end loop;

        if not was then return OM; end if;        --  iteration is exhausted
        
        if is_integer(nf := ^(next_fcns(ix_smallest))) then        -- we are dealing with a tuple

            if (newv := objs(ix_smallest)(nf +:= 1)) = OM then        -- object is exhausted
                ^(next_fcns(ix_smallest)) := OM;            -- flag object as exhausted
            else
                ^(current_vals(ix_smallest)) := newv;            -- post the new current value
                ^(next_fcns(ix_smallest)) := nf;
            end if;

        else        -- we are dealing with an object other than a tuple
            if (newv := nf()) = OM then        -- object is exhausted
                ^(next_fcns(ix_smallest)) := OM;            -- flag object as exhausted
            else
                ^(current_vals(ix_smallest)) := newv(1);            -- post the new current value
            end if;
        end if;
        
        return [[ix_smallest,smallest_val]];
    end iterator_next;

end coord_iter;

The following small test program can be used to examine the use of the preceding class.

program test;         -- test of coordinated iteration
    use coord_iter;

    cit := coord_iter([[3..12],[2..5],[1..10]],[]); 
    print([x: [-,x] in cit]);            -- drop first  components
end test;

It produces the output

[1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 12]

8.12 Object classes which convert small loops to expressions

Where applicable and not too baroque, expressions are often more succinct and less bug-prone than the more primitively explicit bits of procedural ccode which could be used to produce their values. For example,

{e(x): x in s | P(x)}

is better than

s := {}; for x in s | P(x) loop s with:= e(x); end loop;

and

v := +/t; is certainly more succinct and safer than v := 0; for x in t loop v +:= x; end loop;

SETL provides severaal constructs which exploit these advanteges, notably the set- and tuple-formers

{e(x): x in s | P(x)} and [e(x): x in s | P(x)]

and compound operators like '+/t' and 'max/t'. Note that the 'e(x)' clause in a set- or tuple-former gives succinct and general expression to the notion of element-by-element transformation f a composite, while the 'P(x)' clause handles the notion of 'filtering' equally effectively. Nevertheless, there are no end of other loop forms, code fragments, and situations where the same kind of transformation to expression form will occasionally be desirable. The code which follows shows how easily object classes can be used to realize thiskindoftransformation.

We present two object classes, called 'all' and 'each'. 'all' simply duplicates SETL's built-in compound operator by allowing expressions like

0 +/ t to be written as all(0) + t

Similarly 'each' allows expressions like

[tup(j) + c: c = t(j)] to be written as each(tup) + t.

For efficiency, both these classes inherit from an abstract class 'oploop' which assumes that a virtual method 'do_the_loop' will be supplied and passes operators like '+', "*', '**' to 'do_the_loop' for use in whatever loop form it realizes. Then 'all' simply supples the loop form

x := val_1; for y in tupset loop x := op(x,y); end loop; return x;

whereas 'each' supples the loop form

last := val_1(#val_1); return [op(val_1(j)?last,x): x = tup(j)];

class oploop;        --     uses operation loop to apply infix operator to composite
    -- this class has no methods
end oploop;

class body oploop;        --     uses operation loop to apply infix operator to composite
    var val_1;
    var do_the_loop;

    procedure self + tup;            -- addition
         op := lambda(x,y); return x + y; end lambda; return do_the_loop(op,tup);
    end;

    procedure self * tup;            -- product
         op := lambda(x,y); return x * y; end lambda; return do_the_loop(op,tup);
    end;

    procedure self - tup;            -- diff
- y; end lambda; return do_the_loop(op,tup);
    end;

    procedure self / tup;            -- division
         op := lambda(x,y); return x / y; end lambda; return do_the_loop(op,tup);
    end;

    procedure self ** tup;            -- exponentiation
         op := lambda(x,y); return x ** y; end lambda; return do_the_loop(op,tup);
    end;

    procedure self max tup;            -- max
         op := lambda(x,y); return x max y; end lambda; return do_the_loop(op,tup);
    end;

    procedure self min tup;            -- min
         op := lambda(x,y); return x min y; end lambda; return do_the_loop(op,tup);
    end;

end oploop;

class all;            -- functional transformation: compound operator
    inherit oploop;
    procedure create(init_val);            -- creator. 
end all;

class body all;            -- functional transformation: compound operator
    
    procedure create(init_val);            -- creator.
        val_1 := init_val; do_the_loop := doit;            -- save initial value
    end create;

    procedure doit(op,tupset);            -- operation loop
        x := val_1; for y in tupset loop x := op(x,y); end loop; return x;
    end doit;

end all;

class each;            -- functional transformation: compound operator
    inherit oploop;
    procedure create(init_val);            -- creator. 
end each;

class body each;            -- functional transformation: compound operator

    procedure create(init_val);            -- creator. 
        val_1 := if not is_tuple(init_val) then [init_val] else init_val end if;            -- save initial value
         do_the_loop := doit;    
    end create;

    procedure doit(op,tup);            -- operation loop
        last := val_1(#val_1); return [op(val_1(j)?last,x): x = tup(j)];
    end doit;

end each;

The preceding class is exercised by the test program

program test;            -- test of 'all' and 'each'
    use all,each;
    print(all(0) + [1..3]);
    print(each(3) + [1..3]);
    print(each([1..2]) + [1..3]);

    print(all(1) * [1..4]);
    print(each(3) * [1..3]);
    print(each([1..2]) * [1..10]);

end test;

which produces the output

		6
		[4, 5, 6]
		[2, 4, 5]
		24
		[3, 6, 9]
		[1, 4, 6, 8, 10, 12, 14, 16, 18, 20]

8.13 An Object Class Useful in Early Debugging of Recursions

Undebugged recursions in a program, like faulty while loops in it, may fail to terminate. An example is the following defective reccursive version of factorial, which fails to handle the base case for the recursion and so will recurse until memory is exhausted.

	program test;         -- test of rec_limit class
	
	    factorial_bad(10);
	    
	    procedure factorial_bad(n);            -- erroneous version of factorial procedure
	        return n *  factorial_bad(n - 1) ;                -- no mention of base case
	    end factorial_bad;
	
	end test;

In this section we present an object class, analogous tho the 'bugoff' package described 7.1.9, which can be used to detect and control faulty recursions of this type. The class is most easily used by assigning a newly assinged instance ofit to a global variable before the first call to the recursive procedure to be controlled, and by wrapping this procedure (under a modified name) into a small front-end alias rotine which has the same name and the form illustrated by

   procedure factorial_bad(n);            -- alias for erroneous version of factorial procedure
   					-- 'rec_limiter' is an instance of the 'rec_limit' class
        junk := rec_limiter{OM}; return rec_limiter(factorial_bad_original(n));
        		     -- factorial_bad_original is the procedure being debugged; is name must be changed to factorial_bad_original, 
        		     -- but noting else in it should be changed at  all
    end factorial_bad;

Accordingly the test program seen above takes on the slightly modified form

program test;         -- test of rec_limit class
    use rec_limit;
    var rec_limiter;

    rec_limiter := rec_limit("factorial",200);            -- set up recursion-level control

    factorial_bad(10);

    procedure factorial_bad(n);            -- alias for erroneous version of factorial procedure
        junk := rec_limiter{OM}; return rec_limiter(factorial_bad_original(n));            -- note procedure entry
    end factorial_bad;
    
    procedure factorial_bad_orig(n);            -- erroneous version of factorial procedure
        return n *  factorial_bad(n - 1) ;                -- note procedure return
    end factorial_bad_orig;

end test;

The creation routine for a rec_limit object has the form rec_limit(msg,n), where 'msg' designates a string message that should identify the recursion uniquely, and 'n' should be an integer defining the maximum alloweddepth of recursion. If this is exceeded a stop will be forced, and 'msg' will be incorporated in the resulting error string.

rec_limit objects support just two methods, obj{x} and obj(x), which should be used as in the above example, obj{x} to signal recursive entry and obj(x) to capture the corresponding returns. The class code is simply as follows.

class rec_limit;         -- class for imposing limit on  depth of recursion
    procedure create(msg,n);        -- error caption and allowed recursion depth
end rec_limit;
    
class body rec_limit;         -- class for imposing limit on  depth of recursion
    var the_msg,count,count_ptr;

    procedure create(msg,n);        -- error caption and allowed recursion depth
        the_msg := msg; count := n;  ^count_ptr := n;
                        -- retain message and counter value (which is forced to pointer semantics)
    end create;

    procedure self{x};        -- error caption and allowed recursion depth
        if (^count_ptr := ^count_ptr - 1) < 1 then  print("recursion limit " + str(count) + " exceeded: " + str(the_msg)); stop; end if;
    end;

    procedure self(x);        -- error caption and allowed recursion depth
        ^count_ptr := ^count_ptr + 1; return x;
    end;

end rec_limit;

8.14 Storing User-defined Objects In Files

Objects may be stored in either text or binary files, but may only be re-read from binary files. If an object is read from a binary file by a program which does not explicitly use the object's class, then the class will be loaded at the time the object is read. A similar load takes place if an object is converted from a string value with the unbinstr built-in procedure. After the class is implicitly loaded, all the methods corresponding to built-in operations will be available on objects of that class.

8.15 B-Trees: A Larger Example of the use of Object Classes

Several important technical issues arise in connection wih codes which need to work intensively with tuples or strings into which insertions and deletions need to be made in random positions. How are these abstract tuples or strings best represented? If reads and simple assignments predominate, the most efficient way of storing the string or tuple will be its 'flat' form, since this allows characters or components to be accessed by high-speed machine level indexing operations. However, sometimes insertions at random positions in such a string or tuple must be handled, as for example in the case of a tuple which represents a constantly changing database kept in alphabetical order of particular key-fields within it. In such cases the 'flat' tuple form suffers from the fact that repeated insertions may force large sections of the tuple or string (half the total, on average) to be moved repeatedly.

In such situations other, more insertion-friendly tuple representations suggest themselves. On can, for example, divide the original abstract tuple or string into roughly equal sections, for example 100 such sections, and then use the list of such sections (each of which is itself a subtuple) to represent the original tuple. This slows component acess only slightly, by turning one indexing operation into two: we must first access the section to which a desired component belongs, and then access the component within this section. However,the 'tuple of tuples' representation we contemplate greatly speeds up insertions, since for example, to insert a new component into the middle of the original tuple we may only need to move the following components of the subsection in which the insertion occurs, and this may be as little as 1% of the length of the original tuple.

Such a tuple-of-tuple representations has other advantages in connection with the copy operations which need to be applied when strings or tuples are being maintained in a SETL like value-semantics environment. Suppose, for example, that we need to manipulate a very long string, in which we want to change just one character while at the same time keeping a copy of the unchanged string. Keeping the string in a perfectly 'flat' representation may force us to copy the entire string before changing it. But if a sectional, tuple of strings representation is used instead, we will only need to copy the string section actually being changed, along with the top-level tuple which holds the succession of sections together. All the unchanged string sections can simply be shared between the original and the changed string. Here again we get a roughly 100-fold speedup, since the data elements which need to be copied should be only 1% of the size of the full string representation.

But if a two-level, tuple-of-subsections representation has these advantages, we can expect a three-level, tuple-of-tuple-of-subsections scheme to be even more effective. Pushing this idea to its limit suggests that the ultimately desired representation is a recursive tree-like structure of tuples-of-tuples-of-..., whose number of levels grows (logarithmically) with the size of the abstract tuple or string which it represents. This idea, which is that of the B-tree, one of the most famous and important of all non-elementary data structures, is in fact very sucessful. The code which follows in this section shows how this data structure can be used to represent all tuple operations, including component and slice retrieval, assignment, and insertion, tuple concatenation, and iteration over tuples. We give a further set of codes which uses a similar B-tree structure to represent large strings. Note also that the same kind of data structure is fundamental to the general SETL database design explored in Chapter 12.

B-trees can be thought of either as trees or as recursively tuples of objects which are themselves B-trees. The code which follows emphasises the second point of view. It defines a B-tree in the follwing way:

Each btup BT stores a tuple t in an internal 'instance variable' of the same name. BT also stores an integer height value h. If h = 1, our btup BT is pretty much a simple tuple, and its components can be arbitrary SETL objects. If h > 1 every component of the tuple t must itself be a btup object, of height h - 1. In what followsthe component btups of the tuple t will be calledthe children of the btup BT.
The length of the tuples found at every level of a B-tree (except possibly at its topmost level) lies between a specified lower limit 'minw' and upper limit 'maxw'. As a technical convenience we take maxw = 2 * minw - 1. (In the code which follows 'minw' is arbitrarily set to 5, so that the branching factor seen at every node of our trees lies beween 5 and 9. This means that a tree representing an abstract tuple of length 1,000,000 will have height about 8. When B-trees are used in a more realistic way to mangage databases, the minimum branching factor would have a value (proportioned to the size of the physical records of the disk system storing the database) more like 100, so that a tree representing an abstract tuple of length 1,000,000 would be of height no more than 3.)
Each B-tree object also stores an auxiliary 'cumulant' tuple 'cum' of integers. 'cum' has the same length as the tuple t of children stored in the btup BT, and the j-th component cum(j) of 'cum' gives the number of tree 'leaves' that have t(j) and nodes to its left as ancestor. That is, cum(j) is the length of the tuple 'tup' represented by the B-tree section t(1..j). The availabilty of these cumulants speeds up search for a given component tup(k) of 'tup'.

The conventions listed above suggest the main outlines of the B-tree code which follows. The 'btup' objects representing our B-trees store the instance variables h, tup, and cum described above. Two additional instance variables, 'iter_ptr' and 'compno', serve to speed up iterations over B-trees; these variables are given pointer semantics. The parameterless btup 'create' routine forms an empty B-tree, and the auxilary set(t) method changes this to a B-tree representing the parameter tuple t. (The set(t) method is recursive; it constructs the btups corresponding to small tuples diirectly, but builds the btup's representing larger abstarct tuples by recursively concatentating btup's representing half-size tuples.) An auxiliary range operator, which converts any btup object into the standard SETL tuple it represents, is provided to aid in debugging the btup code itself. A second routine 'check_consistency', which applies various consistency checks to btup objects, is also provided for debugging.

The component access and modification routines self(i) and self(i) := x; use the 'cum' data available in each btup object to search recursively for the component addressed. The self(i) := x; operator must also update all cumulants along the sequence of btup objects which it traverses. The length operator #self is elementary, since it merely needs to return the final cumulant value cum(#cum).

The btup concatenation operation x + y is fundamental to all the remaining btup routines. This begins by converting both its arguments to btups if they are not already btups. It the two arguments x andy are then of equal height h, their internal tuples and cumulant vectors are simply concatenated. This gives the desired result unless the resulting concatentated tuple exceeeds the allowed maximum length 'maxw', in which case the auxiliary routine 'ifsplit' divides the resulting btup into two roughly equal parts (each of which is guranteeed to be of length at least 'minw' since maxw = 2 * minw - 1). These two btups then become the two children of a new btup of height h + 1, which 'ifsplit' forms. But if the two arguments of x + y are of differnet heights (assume for example that y is shorter), then the concatenation procedure recursively concatenates y with the last of x's children. Application of 'ifsplit' to this subconcatenation may then generate a pair of btups rather than a single larger btup, and the presence of this extra btup, which must at the top level become a child of x + y, may force a final, top-level split, in which case the height of x + y grows to one more than the height h of x.

The with operation 'x with y' for a btup x clearly is expressible in terms of concatenation.

The next operator implemented is the end-slice extraction operator self(i..). After checking for trivial cases and out-of-bounds conditions, this locates the rightmost child t(j) of the original btup which contains any of the leaves of the desired slice. If this is the last child of self, the end-slice is recursively extracted from this child. Otherwise an end-slice of t(j) is recursively extracted and concatenated with the tuple of children of self which follow t(j).

Even though the initial-slice extraction operator self(1..i) could be programmed in much the same way, the code for this operator is incorporated into the general slice extraction operator self(i..j). Like self(i..), this first checks for trivial cases and out-of-bounds conditions. The special self(1..i) case is then handled in a manner largely symmetrical to the treatment of self(i..). The general case is handled by locating the rightmost child t(k) of the original btup BT which contains any of the leaves of the desired slice, and also the leftmost child t(m) of BT which contains any of the leaves of the desired slice. Appropriate end and start-slices of t(k) and t(m) are then extracted recursively, and concatenated with those children of BT which lie between t(m) and t(k); this constructs the btup slice desired.

The slice assignment operator self(i..j) := x; is implemented (after checking for out-of-bounds conditions and easy special cases) by catenating the retained parts of self (there can be at most two of these) with the btup x.

The extraction operations from, fromb, and frome are elementary speical cases of the slice extraction operators.

The btup 'iterator_start' routine prepares for simple iteration over a btup object by setting up iterator stack, which becomes the value referenced by the iter_ptr instance variable held in the btup. This is a stack of pairs [tup,posn] containing the sequence of btup nodes and their positions leading down to the tuple leaf current at any moment during simple iteration over a btup. The associated iteration method, 'iterator_next', advances this stack by incrementing the topmost 'posn' pointer which has not reached the end of its associated 'tup', and then re-initializing the part of thestack which lies above this incremented item. The related routines 'set_iterator_start' and 'set_iterator_next' are alomost identical to 'iterator_start' and 'iterator_next', but returns pairs [x,index] of elements rather than simply the range elements x.

class btup;                -- btree class representing tuples
    class var debug := false;

    procedure create();      -- create blank btup
    procedure set(t);        -- set from tuple

    procedure check_consistency();        -- check consistency of tree

end btup;

class body btup;                -- btree class representing tuples
    const minw := 5, maxw := 2 * minw - 1;            -- minimum number of descendants, except at top
    
    var h := 1,cum := [],tup := [];            -- height, cumulant tuple, top-level tuple;
    var iter_ptr,compno;                      -- iteration pointer and count
    
    procedure create();        -- create empty B-tree top from tuple
        iter_ptr := newat(); compno := newat();            -- set up iteration stack pointer and compno pointer
        return self;
    end create;

    procedure set(t);        -- set a B-tree from a tuple;

        if (nt := #t) <= maxw then 
            h := 1; cum := [1..nt]; tup := t;
            return self;
        else        -- make two trees of half length, and concatenate them
            bt1 := btup(); bt1 := bt1.set(t(1..nt/2));        -- first half
            bt2 := btup(); bt2 := bt2.set(t(nt/2 + 1..));        -- second half
            return bt1 + bt2;        -- return concatenated version
        end if;
    end set;

    procedure self(i);        -- component extraction
        return if h = 1 then tup(i) elseif i <= cum(1) then tup(1)(i)
                elseif exists c = cum(j) | i <= c then tup(j)(i - cum(j - 1))
                    else OM end if;
    end;

    procedure self(i) := x;        -- component change
        if h = 1 then tup(i) := x; elseif i <= cum(1) then tup(1)(i) := x;
                elseif exists c = cum(j) | i <= c then tup(c)(i - cum(j - 1)) := x; 
                    else abort("index " + str(i) + " is out of range"); end if;
    end;
    
    procedure range self;        -- convert to tuple
        return if h <= 1 then tup else [] +/ [-t: t in tup] end if;
    end;

    procedure #self;        -- total leaf length
        return cum(#cum);
    end;

    procedure self + x;        -- btup concatenation

        if is_tuple(xx := x) then x := btup(); x := x.set(xx); end if;        -- force second argument to btuple
        if #cum = 0 then return x; end if; if #x.cum = 0 then return self; end if;        -- null cases

        new := btup(); 

        if x.h = h then         -- concatenated btup has same height

            new.tup := tup + x.tup; new.h := h;

            c1l := cum(#cum); new.cum := cum + [c + c1l: c in x.cum];
            return new.ifsplit();

        elseif x.h < h then        -- concatenated btup is shorter

            new.tup := tup; new.cum := cum;    new.h := h;            -- copy the element

            end_elt := tup(nt := #tup);        -- the final element
            end_elt := end_elt + x;        -- catenate 1 level down

            if end_elt.h < h then                -- the subconcatenation has not split
                (new.tup)(nt) := end_elt;            -- just install
                new.cum(nt) := new.cum(nt) + x.cum(#x.cum);        -- adjust the cumulant

                return new;
            end if;
            
            new.tup(nt..nt) := end_elt.tup;    new.h := h;    -- otherwise add tuple at top level
            c1ml := if nt = 1 then 0 else cum(nt - 1) end if; 
            new.cum(nt..nt) := [c + c1ml: c in end_elt.cum];

            return new.ifsplit();
            
        else        -- otherwise concatenated element is taller

            new.tup := x.tup; new.cum := xc := x.cum;    new.h := xh := x.h;            -- copy x
            first_x_elt := x.tup(1);        -- the first element of x
            tot_cum := cum(#cum);            -- total cumulant of this tree
            
            first_x_elt := self+ first_x_elt;        -- catenate 1 level down
            if first_x_elt.h < xh then                -- the subconcatenation has not split
                (new.tup)(1) := first_x_elt; 
                for j in [1..#xc] loop new.cum(j) +:= tot_cum; end loop;        -- adjust the later cumulants
                return new;        -- just install
            end if;

            new.tup(1..1) := first_x_elt.tup;        -- otherwise add tuple at top level
            new.cum(1..1) := first_x_elt.cum;        -- likewise cumulant
            for j in [3..#xc + 1] loop new.cum(j) +:= tot_cum; end loop;        -- adjust the later cumulants
            
            return new.ifsplit();
    
        end if;
    end;

    procedure ifsplit();        -- split into 2 nodes if overpopulated

        if (nt := #tup) <= maxw then self.check_consistency(); return self; end if;            -- needn't split

        t1 := tup(1..nto2 := nt/2); t2 := tup(nto2 + 1..);
        c1 :=  cum(1..nto2); c2 := cum(nto2 + 1..);
        cum1 :=  cum(nto2); cum2 := cum(nt);
        
        new1 := btup(); new1.h := h; new1.tup := t1; new1.cum := c1;
        new2 := btup(); new2.h := h; new2.tup := t2; new2.cum := [c - cum1: c in c2];
        newtop := btup();        -- new node of population 2
        newtop.tup := [new1,new2]; newtop.cum := [cum1,cum2]; newtop.h := h + 1;

        return newtop;
        
    end ifsplit;

    procedure x + self;        -- concatenation with btup argument on  right
        if is_tuple(xx := x) then x := btup(); x := x.set(xx); else abort("bad concatenation argument: " + str(x)); end if;        
                            -- force first argument to btuple
        return x + self;
    end;

    procedure selfwith x;        -- item concatenation
        return self+ [x];
    end;

    procedure self(i..);        -- end slice extraction

        if i > (cncp1 := cum(nc := #cum) + 1) or i < 1 then abort("endslice index out of range: " + str(i)); end if;
        if i = cncp1 then return btup(); end if;        -- empty result
        if i = 1 then return self; end if;                -- the whole shebang
        
        must := exists c = cum(j) | c >= i;
        if h = 1 then             -- minimal heght case; slice the tuple
            new := btup(); new.h := 1; new.tup := tup(i..); new.cum := [1..nc - i + 1]; 
            return new;
        end if;

        cumbef := if j = 1 then 0 else cum(j - 1) end if;         -- prior cumulant
        tj := tup(j);
        
        if j = nc then  return tj(i - cumbef..); end if;        -- extract from last component 
        
        tail := btup(); tail.h := h; tail.tup := tup(j + 1..);
        cj := cum(j); tail.cum := [c - cj: c in cum(j + 1..)];
    
        return tj(i - cumbef..) + tail;        -- catenate tail to front piece, and return

    end;

    procedure self(i..j);        -- general slice extraction

        if i > (cncp1 := (cnc := cum(nc := #cum)) + 1) or i < 1 then 
            abort("first slice index out of range: " + str(i)); 
        end if;
        if j < i - 1 or j > cnc then abort("second slice index out of range: " + str(i)); end if;
        if j = i - 1 then return btup(); end if;        -- return an empty btuple

        if h = 1 then             -- minimal heght case; slice the tuple
            new := btup(); new.h := 1; new.tup := tup(i..j); new.cum := [1..j - i + 1]; 
            return new;
        end if;

        must := exists c = cum(jloc) | c >= j;    -- find the top-level tup location of the index j

        if i = 1 then            -- prefix slice extraction
    
            tj := tup(jloc);
            
            if jloc = 1 then return tj(1..j); end if;        -- extract from first component 
            
            pref := btup();         -- will generate new prefix
            cumbef := cum(jloc - 1); subslice := tj(1..j - cumbef);        -- 'subslice' is part of final  node to take
            
            if jloc = 2 then         -- would produce tree with #tup = 1; so descend 1 level
                pref.h := h - 1; pref.tup := (t1 := tup(1)).tup; pref.cum := t1.cum;        -- prefix is identical with first node
            else                    -- taking prefix part won't produce tree with #tup = 1
                pref.h := h; pref.tup := tup(1..jloc - 1); pref.cum := cum(1..jloc - 1);
            end if;
            
            return pref + subslice;        -- catenate tail to front piece, and return

        end if;

        must := exists c = cum(iloc) | c >= i;    -- otherwise find the top-level tup location of the index i
        ind := tup(iloc);            -- the subtup contining i
        prior_cum := if iloc = 1 then  0 else cum(iloc -  1) end if;    -- cumulant prior to this node

        if iloc = jloc then     -- if the two tup locations are the same, then do a subextraction
            return ind(i - prior_cum..j - prior_cum);
        end if;

        jnd := tup(jloc);            -- the subtup contining i
        jprior_cum := cum(jloc -  1);    -- cumulant prior to jnd
        ipart := ind(i - prior_cum..); jpart := jnd(1..j - jprior_cum);        -- head and tail extraction

        if iloc + 1 = jloc then return ipart + jpart; end if;    -- if the two tup locations differ by 1, just catenate

        newmid := btup();      --  otherwise the middle tree has just 1 node; generate new middle
    
        if iloc + 2 < jloc then         -- middle has at least 2 nodes
            newmid.h := h; newmid.tup := tup(iloc + 1..jloc - 1); 
            cumbef := cum(iloc); newmid.cum := [c - cumbef: c in cum(iloc + 1..jloc - 1)]; 
        else                            -- middle has just 1 node; reduce height
            newmid := tup(iloc + 1);
        end if;    
        
        return ipart + newmid + jpart;             -- return concatenation of the front, middle, and end
        
    end;

    procedure self(i..j) := x;        -- slice assignment

        if is_tuple(xx := x) then             -- convert to btup
            x := btup(); x := x.set(xx);
        elseif type(x) /= "BTUP" then
            abort("illegal slice-assignment righ-hand side: " + str(x));
        end if;
        
        if i > (cncp1 := (cnc := if (nc := #cum) = 0 then 0 else cum(nc) end if) + 1) or i < 1 then 
            abort("first slice-assignment index out of range: " + str(i)); 
        end if;
        if j < i - 1 or j > cnc then abort("second slice-assignment index out of range: " + str(i)); end if;

        if i = 1 then             -- over-written  part is prefix

            if j = cnc then    -- all over-written; copy x to self
                h := x.h; tup := x.tup; cum := x.cum; return x;        -- modify this btup; return right-hand side
            end if; 
            
            tail := self(j + 1..); new := btup(); new.h := x.h; new.tup := x.tup; new.cum := x.cum; 
            new := new  + tail;            --  catenate the assigned part plus the retained part
            h := new.h; tup := new.tup; cum := new.cum;        -- modify this btup
            return x;                                            -- return right-hand side
            
        end if;
        
        pref := self(1..i - 1);             -- get the retained prefix
        new := btup(); new.h := x.h; new.tup := x.tup; new.cum := x.cum;         -- the assigned part
        
        if j = cnc then        -- over-written part is suffix
            new := pref + new;            --  catenate the retained part plus the assigned part 
        else                 -- over-written part is middle
            tail := self(j + 1..); new2 := btup(); new2.h := x.h; new2.tup := x.tup; new2.cum := x.cum; 
            new := pref + new + tail;            --  catenate the retained part plus the two assigned parts 
        end if; 

        h := new.h; tup := new.tup; cum := new.cum;            -- modify this btup
        return x;                                            -- return right-hand side

    end;

    procedure self(i..) := x;        -- end slice assignment
    	self(i..cum(#cum)) := x;
       return x;                                            -- return right-hand side
    end;

    procedure from self;        -- front extraction
        if tup = [] then return OM; end if;        -- null case
        x := self(1); y := self(2..);
        h := y.h; cum := y.cum; tup := y.tup; 
        return  x;
    end;

    procedure frome self;        -- end extraction
        if tup = [] then return OM; end if;        -- null case

        x := self(ln := cum(#cum)); y := self(1..ln - 1);
        h := y.h; cum := y.cum; tup := y.tup; 
        return  x;
    end;

    procedure fromb self;        -- end extraction
        if tup = [] then return OM; end if;        -- null case
        x := self(1); y := self(2..);
        h := y.h; cum := y.cum; tup := y.tup; 
        return  x;
    end;
    
    procedure iterator_start;         -- initialize simple iteration over btup
            -- sets up iterator stack as value referenced by iter_ptr.  This is a stack of pairs [tup,posn]
        stack := [];        -- to be built

        node  := self;
        for j in [1..h] loop
            stack with:= [notup := node.tup,if j = h then 0 else 1 end if]; node := notup(1); 
        end loop;

        ^iter_ptr := stack;            -- attach stack to  iteration pointer

    end iterator_start;
    
    procedure iterator_next();         -- step simple iteration over btup
            -- returns value as  singleton,  or OM if terminating
            -- advances iterator stack referenced by iter_ptr
        stack := ^iter_ptr;        -- retrieve the iterator stack

        height := 1;
        
        for j in [ns := #stack,ns - 1..1] loop
            if (sj := stack(j))(2) = #sj(1) then 
                height +:= 1; removed frome stack;        -- remove any exhausted element
            else
                exit;
            end if;
        end loop;
        
        if height = 1 then 
            removed frome stack;    -- pop the top element; then advance it
            [tup,loc] := removed; 
            result := tup(loc +:= 1); stack(ns) := [tup,loc];
            ^iter_ptr := stack; return [result];        -- return singleton tuple built from leaf element
        end if;
        
        if stack = [] then return OM; end if;        -- iteration is exhausted

        removed frome stack;    -- pop the top element, then advance it and use to rebuild rest of stack
        [tup,loc] := removed; node := tup(loc +:= 1); stack with:= [tup,loc];

        for j in [1..hm1 := height -  1] loop            -- rebuild the stack, starting with the node that was advanced 
            stack with:= [notup := node.tup,if j = hm1 then 0 else 1 end if]; node := notup(1); 
        end loop;

        removed frome stack;    -- pop the top element; then advance it
        [tup,loc] := removed; result := tup(loc +:= 1); stack(ns) := [tup,loc];
        ^iter_ptr := stack; return [result];        -- return singleton tuple built from leaf element
        
    end iterator_next;
    
    procedure set_iterator_start;         -- initialize second-form iteration over btup (similar to iterator_start)
        -- sets up iterator stack as value referenced by iter_ptr
        stack := [];        -- to be built
        ^compno  := 0;
        
        node  := self;
        for j in [1..h] loop
            stack with:= [notup := node.tup,if j = h then 0 else 1 end if]; node := notup(1); 
        end loop;

        ^iter_ptr := stack;            -- attach stack to  iteration pointer

    end set_iterator_start;
    
    procedure set_iterator_next();         -- step second-form iteration over btup
        -- returns value as  singleton,  or OM if terminating
        -- advances iterator stack referenced by iter_ptr
        stack := ^iter_ptr;        -- retrieve the iterator stack
        ^compno := cno := ^compno + 1;        -- advance the  component number
        
        height := 1;
        
        for j in [ns := #stack,ns - 1..1] loop
            if (sj := stack(j))(2) = #sj(1) then 
                height +:= 1; removed frome stack;        -- remove any exhausted element
            else
                exit;
            end if;
        end loop;
        
        if height = 1 then 
            removed frome stack;    -- pop the top element; then advance it
            [tup,loc] := removed; 
            result := tup(loc +:= 1); stack(ns) := [tup,loc];
            ^iter_ptr := stack; return [[cno,result]];        -- return singleton tuple built from leaf element
        end if;
        
        if stack = [] then return OM; end if;        -- iteration is exhausted

        removed frome stack;    -- pop the top element, then advance it and use to rebuild rest of stack
        [tup,loc] := removed; node := tup(loc +:= 1); stack with:= [tup,loc];

        for j in [1..hm1 := height -  1] loop            -- rebuild the stack, starting with the node that was advanced 
            stack with:= [notup := node.tup,if j = hm1 then 0 else 1 end if]; node := notup(1); 
        end loop;

        removed frome stack;    -- pop the top element; then advance it
        [tup,loc] := removed; result := tup(loc +:= 1); stack(ns) := [tup,loc];
        ^iter_ptr := stack; return [[cno,result]];        -- return singleton tuple built from leaf element
    end set_iterator_next;
 
    procedure check_consistency();        -- check consistency of tree

        if h > 1 then -- the number of nodes must be in the correct range
            if exists nd = tup(j) | (nc := #(nd.tup)) < minw or nc > maxw then 
                abort("node count out of range: " + str(nc) + " " + str(nd)); 
            end if;
            if exists nd = tup(j) | nd.h /= h - 1 then 
                abort("node height inconsistency: " + str(j) + " " + str(nd.h) + " " + str(h)); 
            end if;
        end if;
        
                    -- the cumulant differences at this level must be the sum of those at the next lower level
        if h > 1 then 
            if exists nd = tup(j) | cum(j) /=  if j = 1 then 0 else cum(j - 1) end if + nd.cum(#nd.cum) then
                abort("bad cumulant: loc = "  + str(j) + " cum at loc = "  + str(cum(j)) + " priorcum  = " 
                              + if j = 1 then "" else str(cum(j - 1)) end if  + " node cum = "  + str(nd.cum(#nd.cum))); --  + " "  + str(nd)
            end if; 
        elseif cum /= [1..#tup] then
            abort("bad bottom-level cumulant: " + str(cum) + " " + str(nd)); 
        end if;
        
        if h > 1 and (exists nd in tup | not nd.check_consistency()) then abort("bad abort"); end if;
        
        return true;            -- passed the consistency check
        
    end check_consistency;
   
end btup;

8.13Another form of SETL Extension: Use of operation-error cases to invoke SETL library functions.

As detailed in the preceding sections of this in Chapter, extended meanings for SETL's infix operators and other built-in syntactic constructs are specified by including function definitions with header lines of forms like

		procedure self + y; ...
		procedure self(x); ...

		procedure self{x}; ...

		procedure self(i..j); ...

		procedure self(x) := y; ...

		procedure self{x} := y; ...

		procedure self(i..j) := y; ...

		procedure iterator_start; ...

		procedure iterator_next; ...

		procedure set_iterator_start; ...

		procedure set_iterator_next; ...

in an object-class definition. This makes it possible give SETL operators convenient new meanings for types of objects which are not part of SETL's standard object repertoire. However, it is sometimes appropriate to extend the meaning of SETL operators to cases involving only standard objects, but in wasy that SEL does not allow (or, better said, does not use.) For example, if f is a function or one argument and s is a set or tuple, it will sometimes be convenient to use f * s as an abbreviation for {f(x): x in s} or [f(x): x in s].

More specifically, SETL allows operation-error cases, that is, object-operator combinations that are not supported in SETL's built-in library, which would ordinarily trigger run-time aborts, to be redefined as function invocations. This mimics the SETL mechanism for extending built-in operations to user-defined object classes.

SETL allows all the redefinitions listed above (excepting the iterator redefinitions) to be applied, in what currently are operation error cases, to built-in SETL objects. This is done by useing special packages whose name starts with 'error_extension_' in the syntactic environment (a package or a class) within which such an extension or group of extensions is to apply. (Several such package can be used in such an environment, but if two of these redefine the same operation, they cancel each other and neither redefinition applies; this is the same rule which applies to conflicting function definitions of other kinds which appear in multiple packages used within a single syntactic context.)

For example, SETL does not define the mutiplication operation f * s if s is a set. But by writing

	package error_extension_f_usage;
	end error_extension_f_usage;

	package body  error_extension_f_usage;
		procedure self * s; return if is_set(s) then {self(x): x  in s} else [self(x): x  in s] end if; end;
	end error_extension_f_usage;
	
	program test;
		print(t := float * [1..10],cos * t);
	end test;

we remedy this omission and can use this otherwise forbidden diction.

Note the simple syntactic rules which apply here:

'error_extension_' packages require package headers,but these are always empty.
The syntactic form of SETL operator redefinitions within 'error_extension_' packages is the same as that within object classes. In particular, the first argument of any such redefinition must always be called self.
In addition to its object redfinitions, an 'error_extension_' package can, like other packages, contain auxiiary function definitions, but no of these can be visible externally.

A final rule, not relevant to this example, is that each program, package, or class can use at most one 'error_extension_' package. However, each 'error_extension_' package can use one other, allowing chains of error extensions to be used in the manner described below.

Here is a second example. SETL does not define the division operation s / t if s and t are both strings, so in the 'default' SETL environment this case of the division operator will be treated as a run-time error. However, if a package_named 'error_extension_xxx' is used in this environment, and if that package contains an operator definition procedure of the form

		procedure self / y;..

then this procedure will be invoked at the point of error, 's' being passed as its first argument self and 't' as its second argument y. Within the procedure, one would expect to find a test or case statement such as

	case type(self)

	when "STRING" => (code for evaluating self / y when both are strings)
	when "TUPLE" => (etc.)

	otherwise => 
		return self / y;
	end if;

The final 'return self / y;' simply propagates the original error. This may merely result in a somewhat delayed version of the run-time error abort that the divide operation would have triggered in the first place. However, since the package 'error_extension_xxx' may itself use another package 'error_extension_zzz', this secondary error may in turn trigger a call to a procedure of the form

		procedure self / y;..

found in 'error_extension_zzz', and so on iteratively until either the original operation s / t finds its interpretation, or an actual error-abort results.

These conventions make otherwise illegal cases like s / t, where s and t are both strings, usable within SETL programs, packages, and classes. The scope within which any such error extension applies is a single program, package, and class PPorC, the applicable error redefinition being specified, as explained above, by the 'error_extension_...' package used in PPorC.

Extensions of this kind are potentially handy, especially for exploring contemplated SETL extensions. Suppose, for example, that we let s / t be the operation which cuts any string s at all the occurrences of t in s, thereby forming a tuple; and (ii) applies recursively to sets and tuples of strings. Then

	"a,b;c,d;ee,ff" / ";" / "," 	is 	[["a","b"],["c","d"],["ee","ff"]].

But "a,b c,d ee,ff"/ ";" / "," is clearly a more convenient form of input than [["a","b"],["c","d"],["ee","ff"]]. Suppose next that we define t * s, where t is a tuple and s is a string, to be the concatenation t(1) + s + t(2) + s + ... + t(#t). Then if s, a, and b are all strings, s / a * b replaces every occurence of a in s by an occurence of b.

Many useful extensions of this kind suggest themselves. If s, t, and u are strings and n is an integer, s(t) can designate the index of the first occurence of t in s (or OM if there is none); s(t) := u can replace the first occurence of t by u; s(n..t) can designate the index of the first occurence after position n of t in s, etc. If t is a set or tuple and f is a one-parameter procedure, f * t can designate the result of applying f to each of the components or elements of t. Many other such cases suggest themselves, as ways of lending futher flexibility to SETL.

The rules governing the syntactic scopes within which error redefinitions apply facilitate their combination. Suppose, for example, that a useful definition for the quotient s / t has already been found and made available within a package 'error_extension_strings', and that we now want to extend this to the case in which s is a string but t is a set (for example, t might be a set of strings, and s / t might break s at each occurence of any string x in t, but leave these x in the resulting sequence.) To do this, one could simply write a package 'error_extension_strings2' in which

		procedure self / y;..

had a form something like

		procedure self / y;
			if (type(y) = "SET") then 
				...
			else
				return self / y;
			end if;
		end;

If the 'error_extension_strings2' package 'use's the 'error_extension_strings' package, any non-erroneous definition of s / t which it supplies will be applied, but otherwise the definition of s / t supplied by 'error_extension_strings' will be used. If no meaning can ultimately be found for s / t, an error abort will result.

We give a few specific examples illustrating the use of error extension packages. Suppose that the two following error packages have been compiled into the SETL library.

package error_extension_floats;
end error_extension_floats;

package body  error_extension_floats;

	procedure self * s;
		if is_integer(self) then self:= float(self); end if;
		if is_integer(s) then s := float(s); end if;
		return self * s;
	end;

	procedure self + s;
		if is_integer(self) then self:= float(self); end if;
		if is_integer(s) then s := float(s); end if;
		return self + s;
	end;

	procedure self - s;
		if is_integer(self) then self:= float(self); end if;
		if is_integer(s) then s := float(s); end if;
		return self - s;
	end;

	procedure self / s;
		if is_integer(self) then self:= float(self); end if;
		if is_integer(s) then s := float(s); end if;
		return self / s;
	end;

	procedure self ** s;
		if is_integer(self) then self:= float(self); end if;
		if is_integer(s) then s := float(s); end if;
		return self ** s;
	end;

end error_extension_floats;

package error_extension_misc;
end error_extension_misc;

package body  error_extension_misc;
	use string_utility_pak;
	use error_extension_floats;

	procedure self + s; return self + s; end;
	procedure self ** s; 

		case type(self)
                when "STRING" => return self(#self + s + 1..);  
                otherwise => return self ** s;
		end case;

	end;

	procedure self / s;		-- breakup function, represented as division

		case type(s)
			when "STRING" => return breakup(self,s);
						-- uses string splitting routine from 'string_utility_pak'
			otherwise => return self / s;
		end case;
		
	end;

	procedure self - s;		-- suppress_chars function, represented as subtraction

		case type(s)
			when "STRING" => return suppress_chars(self,s);	-- from 'string_utility_pak'
			otherwise => return self - s;
		end case;

	end;

	procedure self * s;		-- 'join' function, represented as multiplication
		var my_self;
		my_self := self;
	
		case type(self)

			when "TUPLE" =>
	
				case type(s)

					when "STRING" => 

						stg := if #self = 0 then self else self(1) end if;
						for x = self(j) | j > 1 loop stg +:= (s + x); end loop;
						return stg;

					when "TUPLE" => return [self(j) * x: x = s(j)];
								-- lengths assumed equal

				end case;

			when "PROCEDURE" =>
	
				case type(s)
	
					when "TUPLE" => return [self(x): x in s];

					when "SET" => return {self(x): x in s};

					when "PROCEDURE" =>	
						return lambda(x); return my_self(s(x)); end lambda;

				end case;
			
			when "REAL" =>

				case type(s)

					when "TUPLE" => return [self * x: x in s];

					when "SET" => return {self * x: x in s};
			
				end case;

		otherwise => return self * s;

		end case;

	end;

	procedure self mod x;		-- alternative syntax: self(x)

		case type(x)
			when "STRING" => return word_loc(self,x);
			when "INTEGER" => return self(#self + 1 + x);
					-- character from end; x must be negative
		end case;

	end;

	procedure self max x;		-- alternative syntax: self(x..)

		case type(x)
			when "INTEGER" => return self(#self + 1 + x..);
					-- character from end; x must be negative
		end case;

	end;

end error_extension_misc;

The following test program exercises this miscellany of error extensions:
program err_extension_test; -- test of error-extension packages use error_extension_misc; -- use error_extension_floats; -- this must be omitted: would cause errors print([2 * 0.5,2 + 0.5,2 - 0.5,2 / 0.5,0.5 ** 2]); print(float * [1..3]); print(0.5 * (float * [1..3])); print((0.5 * [1..3]) * (0.5 * [1..3])); print(cos * (0.5 * [1..3])); print((cos * sin) * (0.5 * [1..3])); print(cos * (sin * (0.5 * [1..3]))); print("ainbancofdinaaaofbbbanccc" / ".in.an.of"); print("a..b.;.c" - ".;"); print("a b c" / " " * "..."); print("jack tom mike" mod "tom"); -- "jack tom mike"("tom") print("jack" mod - 1); -- "jack(-1) print("jack" ** - 2); -- "jack(-2..) end err_extension_test;

The output produced by this program is
[1.0000000000, 2.5000000000, 1.5000000000, 4.0000000000, 0.25000000000] [1.0000000000, 2.0000000000, 3.0000000000] [0.50000000000, 1.0000000000, 1.5000000000] [0.25000000000, 1.0000000000, 2.2500000000] [0.87758256189, 0.54030230587, 0.070737201668] [0.88726005072, 0.66636674539, 0.54240850453] [0.88726005072, 0.66636674539, 0.54240850453] ["a", "in", "b", "an", "c", "of", "d", "in", "aaa", "of", "bbb", "an", "ccc", ""] abc a b c... 6 k ck