Flat sequences of characters have few advantages as a data structure for character strings other than simplicity. Tree or graph structures are generally more flexible and more efficient. This example of the use of AOP is an (incomplete) implementation of "stronger strings" called cords in Java, in an object-oriented style.
One of the most common operations on strings is concatenation. For concatenation to be both time and space efficient in a world where strings are immutable, it is necessary for the result to share as much of the data structure with its arguments as possible.
The natural data structure represents a concatenation of two
strings as a
Concat object that
contains references to its arguments. Since the arguments may also be
Concat objects, we quickly arrive
at a tree structure in which the internal nodes represent
concatenation and the leaves are
Flat strings that store the
actual characters. The basic string operations like
equals, etc. propagate down the
tree as far as necessary.
In fact, the structure is not a tree but a directed graph, since it is quite possible to concatenate a string to itself. Because strings are immutable this sharing causes no problems and locking is never necessary, so will will continue to refer to trees.
A substring of such a structured string can be represented by a
Substring object that stores a
reference to the base string of which it is a substring, and the
Given this tree representation, fetching the ith
character of a string
s will take
time proportional to the depth of the tree. If the tree is balanced,
this will be log2
s.length(). However, since
strings are usually built by concatenation, maintaining balance at
all times requires that the concatenation operation rebalance the
tree, which means that concatenation will also take logarithmic
Because the typical usage pattern of a string is a period in which it is built by concatenation followed by a period in which it is manipulated, this is probably a bad tradeoff. Instead we provide a client-callable balance method, which achieves approximate balance, and only force a rebalance if the tree becomes extremely lopsided.
The cord data structure is very amenable to the addition of other
kinds of leaf string in addition to flat strings. For example, a
FileString might store most of
its characters in a file, except for a small buffer. New kinds of
interior nodes can also be added. For example, the
map message currently traverses
every character in a string applying a client-provided translation
function, and produces a new string containing the translated
characters. This operation could also be implemented lazily, by
Map class that stores
the base string and the translation function, and which applies the
function to the characters when (and if) they are demanded.
To make these extensions possible,
cord.String is an interface, not
cord.String is similar
java.lang.String, but contains
some additional methods such as
perform that do string traversal.
This is because such traversals can be performed in linear time,
whereas traversing a string by repeated calls to
charAt would take time n
log n for a string of length n.
These extensions do not seem to require aspects; class structure provides appropriate locality.
The basic cord classes were first implemented, using the interface
cord.String. These are
Empty. All of these classes
which provides no representation but does provide abstract
implementation of many of the utility methods. No account was taken
of balancing. These classes were tested and debugged without
balancing, and as expected gave rise to excessively deep trees.
Balancing was then added as an aspect. To know when a string is
balanced, and to avoid rebalancing already balanced sub-trees, every
must be able to answer its depth efficiently. The depth of a
Flat string is zero and the depth
Concat string is one greater
than the maximum depth of its children. So the first thing that the
Balance aspect does is introduce
an additional (
Concat objects, an additional
depth() into interface
String, and four implementations
of the depth method:
Flat.depth() simply return
Concat.depth() return the value
of the depth field.
It is also necessary to initialize the depth field when the
Substring objects are created.
This is accomplished by before advise weaves on the
constructors. In addition to calculating the value of depth, these
weaves take the opportunity to explicitly rebalance the argument
Strings if they are excessively deep.
A String s is balanced if
s.depth()+2, where Fibn
is the nth Fibonacci number. To test this
condition, we use an array
to the Fibonacci sequence. This array is part of the aspect and is
within the lexical scopes of all of the methods defined in the
aspect. Since the length field in this implementation is an
int, the length of a
cord.String cannot exceed
231, and thus the depth of a balanced tree must be 44 or
less, since Fib47 > 231.
balance method itself is
introduced into the interface
cord.String. This does two
things: it introduces an abstract version of the method into the
String interface, and a concrete
version of the method into every class that directly implements the
String interface. In this case,
Substring also implement the
String interface, they also
it is not necessary for AspectJ to introduce the method into these
balance is written
in terms of several helper methods. These are also defined in the
Balance aspect. In some cases variants of the method can be defined
for the various classes. For example, if an
Empty string object is asked to
add itself to the balancing structure, the code is particularly
Although balancing is the only aspect that has been implemented to date, other possibilities have been identified.
At present, the code does not optimize the case of concatenating a
null string to the right of
Empty does optimize
the case of concatenating a string to the null string.) Inserting all
of the appropriate tests for the null string might possibly be done
as an aspect. Indeed, it is possible that Empty should itself have
been an aspect, although it seemed at the time that dealing with null
strings was part of the base functionality.
When the strings involved are sufficiently short, it does not make
sense to build these elaborate tree structures: it is more efficient
simply to copy the characters. In the Cedar rope
implementation (see below) "sufficiently short" means 24 characters
or less. The changes to
Flat.Concat necessary to simply
build a longer
Flat string object
rather than a
object could be added as an aspect.
As has been mentioned above, repeatedly calling
str.charAt(i) for i=0 to
str.length() is not an efficient way of programming a
str, because of the
time necessary to dive down into the appropriate sub-tree for each
perform method is
provided as an efficient alternative. However, old programming habits
die hard, and it might therefore be desirable to speed up
charAt by using a cache.
At any level of the tree, a
Substring node could keep a cache
of one or more piece records containing the following
somewhere in the tree below
endIndex of the substring of
this that is represented by
adjustment (a signed
integer) that must be added to an index
this, so that
These records would enable
charAt to skip all of the
intermediate levels in the tree in the event of a cache hit. An
aspect would allow the cache to be added to the representation of the
objects, and also provide a way of introducing the code that builds
and takes advantage of the cache.
The Xerox Cedar system provided an industrial strength implementation of strings in the Mesa language called ropes. A version of the Cedar system is available as a CD (Using Threads in Interactive Systems: A Case Study, by Carl Hauser, Christian Jacobi, Marvin Theimer, Brent Welch, and Mark Weiser, December 1993. Xerox PARC Technical Report CSL-93-16). A lighter-weight implementation of the same concept, in the C language called cords was undertaken by Hans Boehm and is available as part of a portable garbage collector (download one of the tar files and look at the cord subdirectory).
More information about the data structure is available online (page originally prepared by Hans Boehm) and in:
Boehm, Atkinson, and Plass, "Ropes: An Alternative to Strings", SoftwarePractice and Experience 25, 12 (Dec 1995), pp. 1315 - 1330.