Abstract Data Types
The use of abstract data types (ADT's) as a structuring mechanism for programs is well-established practice.
To review, an ADT is a type together with some operators on that type, such that
the operators have an external behavior specification
the type has a private internal representation
the operators have private internal implementations
Clients of the ADT see only the external specification, none of the internal details. This makes it possible for the implementer of the ADT to change the internals without affecting the clients.
In effect, the implementer and client agree to an interface contract specifying how values of the type may be used. Ideally, this contract should be completely expressible in our programming language so that the compiler can enforce it. Since operator behaviors are hard to specify, the best we get in practice is an approximation of the contract: a signature specifying the types of each operator.
Example: Sets
Consider an abstract type of sets of values. We might give it the following signature, using an ML-like notation:
By itself, this doesn't tell us how the various functions should behave. We could specify the desired behavior (semi-formally) using some equations such as these:
We could add many more equations that we would hope to be true, but these three turn out to be sufficient to completely characterize the external behavior of the primitives.
(Of course, there are other possible signatures, including fundamentally more powerful ones, e.g., supporting union, intersection, etc.)
ML Implementation
Here's a simple implementation of sets in ML, using (unordered) lists (possibly with duplicates).
This implementation is not at all abstract.
'a set is just a synonym of 'a list, allowing
arbitrary lists to be treated like sets.
Client can access and hence depend on internal representation of sets.
Client can ``spoof'' the operators into thinking an arbitrary
(bogus) list is a set.
(Note appearance of equality type variable ''a.)
Towards Abstraction
We've already seen a way to distinguish the 'a set type from 'a list: make set a datatype, e.g.,
But the client of the type can still see inside a set by matching it against a pattern with the SET constructor. And the client can stil invent bogus sets by applying SET to arbitrary lists. So we still don't have proper abstraction
Key idea: since all access to the contents of a datatype depends on the constructors, obtain abstraction by hiding the constructors from the client!
Further Towards Abstraction
How might we hide the constructors? As a first approach, let's try using the local facility:
Abstypes
This comes close: client can no longer access internals of a SET. But there are still some problems:
The Set type now has no proper name at all.
Yet the top-level display still knows that the underlying representation
uses lists.
And the built-in equality operator is able to compare sets, which
is inappropriate if they are really abstract.
It turns out that we need a special language mechanism to achieve the precise effect we want. This is the abstype declaration. It's just like a datatype except that it comes with a list of operator functions; they can see the datatype definition, but external clients cannot.
Abstype example
Abstract types print as ``-'' and are not equality types. (In what way do they remain slightly non-abstract, though?)
One last objection: we still have no way of separating (in the program text or in time) the specification of the ADT interface from its implementation. We'll see how to do this with the module system soon.
An alternative implementation
We can now change the implementation of sets without any chance of invalidating client code that uses them. (Of course, client code does need to be recompiled.)
For example, we might arrange that the representation lists contain only unique elements. This will lower space requirements for sets into which the same elements are repeatedly inserted, and in general speed up removes at the cost of slowing down inserts.
An excessively alternative implementation
We might now be tempted to improve our Set implementation further by representing sets as sorted trees. This should improve the asymptotic time behavior of all the primitives. (Why not try sorted lists?)
But we have no < operator with which to compare the order of arbitrary 'a values.
In fact, our intended implementation only makes sense for sets of values that can be ordered, and depends on the choice of ordering relation. (This points up that our existing list-based implementations only make sense for sets of values on which the built-in equality predicate is valid - and may do the ``wrong thing'' even on these.)
One solution is to make the order predicate an explicit parameter of the ADT signature (a new signature!). It is convenient to specify the parameter (just) when creating a new set, and then carry it as part of each value.
A Function-based Implementation
ML provides no direct support for enforcing the equational specification that we gave originally - merely for making sure that the type signature is respected and that values of the ADT cannot be put together or taken apart by clients.
But as a final implementation of (equality-based) Set, we'll use the equational spec quite directly. The idea is to represent as set by its own membership function!
Note that this implementation approach extends well to infinite sets too.