CS301 W'99 Lecture Notes Lecture 15 PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 1 Need for abstraction Is a new type name a genuinely new type, equivalent to the built-in types? E.g., let's implement a stack using an array: TYPE stack = ARRAY 100 OF INTEGER; VAR s : stack := ARRAY(100,0); VAR top : INTEGER := 0; PROCEDURE push(i:INTEGER; s: STACK) IS BEGIN s[top] := i; top := top + 1; END; ... o User of stack can abuse stack discipline, e.g., s[random] := 42; o stack, s, t, push, etc. aren't grouped together. o Intended use of stack isn't explicit. On other hand, machine datatypes usually are presented abstractly. We don't write if (x & 0x80000000) printf ("x is negative"); We'd like to "extend language" by making user-defined types "act like" built-in hardware types. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 2 Abstract Data Types (ADT's) Ideally, to mimic the behavior of built-in hardware-based types, user-defined types should have an associated set of operators, and it should only be possible to manipulate types via their operators (and maybe a few generic operators such as assignment or equality testing). In particular, when new types are given a representation in terms of existing types, it shouldn't be possible for programs to inspect or change the fields of the representation. Such a type is called an abstract data type (ADT), because to clients (users) of the type, its implementation is hidden; only its interface is known. We can implement an ADT by combining a type definition together with a set of function operating on the type into a module (or package, cluster, class, etc.) Additional hiding features are needed to make the type's representation more-or-less invisible outside the module. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 3 Abstraction Compare to procedural abstraction: proce- dure can be called if its type is known, even if its imple- mentation is not. Benefits of abstraction: o Implementation and client can be developed indepen- dently. o Implementation can be changed without affecting client's code. o Improves clarity, maintainability, etc. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 4 Example: ADT for Environments (pseudo-PCAT) SIGNATURE env IS TYPE env; VAR empty : env; PROCEDURE extend (e:env,s:STRING,i:INTEGER):env; PROCEDURE lookup (e:env,s:STRING) : INTEGER; END; MODULE env : env IS TYPE env = RECORD id: STRING; val: INTEGER; next : env; END; VAR empty : env := NIL; PROCEDURE extend (e:env;s:STRING;i:INTEGER):env IS BEGIN RETURN env -id := s; val := i; next := e "; END; PROCEDURE lookup (e:env,s:STRING) : INTEGER IS BEGIN WHILE e <> NIL DO IF e.id = s THEN RETURN e.val; END; RETURN -1; END; END; PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 5 Client code is restricted: (* client *) VAR x := env.empty; x := env.extend(x,"abc",99); env.lookup(x,"def"); print (x.next.val); (* NONO! *) Thus, bodies can be changed without affecting clients: (Note following implementation is not actually as general as the first one, but still matches signature.) PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 6 MODULE env : env IS TYPE envr = RECORD id: STRING; val: INTEGER; END; TYPE env = ARRAY 100 OF envr; VAR empty := array(100,NIL); PROCEDURE extend (e:env;s:STRING;i:INTEGER) : env IS BEGIN VAR c : INTEGER := 0; WHILE (e[c] <> NIL) DO c := c + 1; END; e[c] := envr - id = s; val = i: "; RETURN e; END; PROCEDURE lookup (e:env,s:STRING) : INTEGER IS BEGIN VAR c : INTEGER := 0; VAR a : INTEGER := -1; WHILE (e[c] <> NIL) DO IF (e[c].id = s) a := e[c].val; c := c + 1; END; RETURN a; END; END; PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 7 Interface vs. Implementation Ideally, the client of an ADT is not supposed to know or care about its internal implementation details - only about its exported interface. Thus, it makes sense to separate the textual description of the interface from that of the implementation, e.g., into separate files. o Specifications give the names of types, and the names and types of functions in the package. o Bodies give the definitions of the types and functions men- tioned in the specification, and possibly additional private definitions. One advantage of this separation is that clients of module X can be compiled on the basis of the information in the specification of X, without needing access to the the body of X (which might not even exist yet!) But many languages, particularly in the C/C++ tradition, don't make this separation very cleanly. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 8 Is abstraction always desirable? Although the idea of defining explicitly all the operators for a type makes good logical sense, it can get quite inconvenient. Programmers are used to assigning values or passing them as arguments without worrying about their types. They may also expect to be able to compare them, at least for equality, without regard to type. So most languages that support ADT's have built-in support for these basic operations, defined in a uniform way across all types. They also usually have facilities for overriding the built-in definitions with type-specific versions. For example, built-in equality on intset is unlikely to work on contents of set, so probably want a type-specific equality operator. (Some of the complexity of C++ derives from this.) Unfortunately, it is impossible to generate code for opera- tions that move or compare data without knowing things like the size and layout of the data. But these are char- acteristics of the type's implementation, not its interface. So these "universal" operations break the abstraction barrier around type. Thus, supporting these operations conflicts with separate compilation, often weakening support for the latter. The problem can also be solved, at some cost in efficiency, by treating all abstract values as fixed-size pointers to heap- allocated values. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 9 Modules in General An ADT is one particular kind of module, containing: o a single abstract type, with its representation; o a collection of operators, with their implementations. Instances of the ADT are typically created dynamically, and contain space for the components of the representation; all the instances share the same operator code. More generally, modules might contain: o multiple type definitions; o arbitrary collections of functions (not necessarily abstract operators on the type); o variables; o constants; o exceptions; etc. Primary purpose is to divide large programs into (some- what) independent sections, offering separate namespaces and perhaps separate compilation. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 10 Interfaces Even when a module does not represent a particular abstract data type, it usually represents a kind of abstraction over some set of facilities, in which some implementation infor- mation will be hidden behind an interface. Clients of a module want to know what module does, not how it does it. Of course, specifying "what" is a hard prob- lem! A key goal is that it should be possible to change the implementation without rewriting (or ideally, even recompil- ing) the client code that depends on the interface. Most languages use type information to give a partial char- acterization of what a module does. An interface definition is then a collection of identifiers with their types. In many languages it is possible to write and compile client code based solely on type interfaces. Of course, there must also be an (at least informal) specification of what the mod- ule's facilities do, and few languages provide any support for making sure that the implementations adhere to more than a type specification. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 11 Modules in C? Even C provides a (primitive) form of (unnamed) modules, i.e., files. o The top-level declarations in a file are its components. o By default, all components are exported, but they can be hidden using the static specifier. o The .h file serves as a rough kind of interface specification. Manual methods must be used to ensure that such files are accurate and complete, and that they are used where needed. The ma jor defect of C's approach is that all the names ex- ported from all the files linked into a program occupy one global name space, and hence must be unique. There is no "dot" notation. PSU CS301 W'99 Lecture 15 Oc Andrew Tolmach 1992-99 12 Parameterization Often want to write a module definition that is parameter- ized over another type, e.g., sets of elements: SIGNATURE set(element) IS TYPE set(element); PROCEDURE INSERT(s: set(element); e: element); PROCEDURE MEMBER(s: set(element); e:element) : BOOLEAN; PROCEDURE UNION(s,t: set(element)):set(element); END; MODULE set(element) : set(element) IS ... END; MODULE intset = set(INTEGER); (* INSTANTIATE *) Key question: does each instantiation require recompilation of module code? Yes, in Ada or C++; No, in ML. Note that we can only parameterize sensibly over certain classes of types, e.g., set(element) only makes sense if there is an equality operator on element. (Haskell type classes.)