No Title

Abstract Data Types

The use of abstract data types (ADT's) as a structuring mechanism for programs is well-established practice.

To review, an ADT is a type together with some operators on that type, such that

the operators have an external behavior specification

the type has a private internal representation

the operators have private internal implementations

Clients of the ADT see only the external specification, none of the internal details. This makes it possible for the implementer of the ADT to change the internals without affecting the clients.

In effect, the implementer and client agree to an interface contract specifying how values of the type may be used. Ideally, this contract should be completely expressible in our programming language so that the compiler can enforce it. Since operator behaviors are hard to specify, the best we get in practice is an approximation of the contract: a signature specifying the types of each operator.

Example: Sets

Consider an abstract type of sets of values. We might give it the following signature, using an ML-like notation:

code51

By itself, this doesn't tell us how the various functions should behave. We could specify the desired behavior (semi-formally) using some equations such as these:

code53

We could add many more equations that we would hope to be true, but these three turn out to be sufficient to completely characterize the external behavior of the primitives.

(Of course, there are other possible signatures, including fundamentally more powerful ones, e.g., supporting union, intersection, etc.)

ML Implementation

Here's a simple implementation of sets in ML, using (unordered) lists (possibly with duplicates).

code56

This implementation is not at all abstract.

'a set is just a synonym of 'a list, allowing arbitrary lists to be treated like sets.

Client can access and hence depend on internal representation of sets.

Client can ``spoof'' the operators into thinking an arbitrary (bogus) list is a set.

(Note appearance of equality type variable ''a.)

Towards Abstraction

We've already seen a way to distinguish the 'a set type from 'a list: make set a datatype, e.g.,

code66

But the client of the type can still see inside a set by matching it against a pattern with the SET constructor. And the client can stil invent bogus sets by applying SET to arbitrary lists. So we still don't have proper abstraction

Key idea: since all access to the contents of a datatype depends on the constructors, obtain abstraction by hiding the constructors from the client!

Further Towards Abstraction

How might we hide the constructors? As a first approach, let's try using the local facility:

code76

Abstypes

This comes close: client can no longer access internals of a SET. But there are still some problems:

The Set type now has no proper name at all.

Yet the top-level display still knows that the underlying representation uses lists.

And the built-in equality operator is able to compare sets, which is inappropriate if they are really abstract.

It turns out that we need a special language mechanism to achieve the precise effect we want. This is the abstype declaration. It's just like a datatype except that it comes with a list of operator functions; they can see the datatype definition, but external clients cannot.

Abstype example

code83

Abstract types print as ``-'' and are not equality types. (In what way do they remain slightly non-abstract, though?)

One last objection: we still have no way of separating (in the program text or in time) the specification of the ADT interface from its implementation. We'll see how to do this with the module system soon.

An alternative implementation

We can now change the implementation of sets without any chance of invalidating client code that uses them. (Of course, client code does need to be recompiled.)

For example, we might arrange that the representation lists contain only unique elements. This will lower space requirements for sets into which the same elements are repeatedly inserted, and in general speed up removes at the cost of slowing down inserts.

code89

An excessively alternative implementation

We might now be tempted to improve our Set implementation further by representing sets as sorted trees. This should improve the asymptotic time behavior of all the primitives. (Why not try sorted lists?)

But we have no < operator with which to compare the order of arbitrary 'a values.

In fact, our intended implementation only makes sense for sets of values that can be ordered, and depends on the choice of ordering relation. (This points up that our existing list-based implementations only make sense for sets of values on which the built-in equality predicate is valid - and may do the ``wrong thing'' even on these.)

One solution is to make the order predicate an explicit parameter of the ADT signature (a new signature!). It is convenient to specify the parameter (just) when creating a new set, and then carry it as part of each value.

code96

A Function-based Implementation

ML provides no direct support for enforcing the equational specification that we gave originally - merely for making sure that the type signature is respected and that values of the ADT cannot be put together or taken apart by clients.

code99

But as a final implementation of (equality-based) Set, we'll use the equational spec quite directly. The idea is to represent as set by its own membership function!

code103

Note that this implementation approach extends well to infinite sets too.

Andrew P. Tolmach
Thu May 15 21:24:19 PDT 1997