A Taxonomy of meta-programming systems.

In a meta-programming system meta-programs manipulate object-programs. Meta-programs may construct object-programs, combine object-program fragments into larger object-programs, observe the structure and other properties of object-programs, and execute object-programs to obtain their values.

There are two important kind of meta-programming scenarios: program generators, and program analyses. Each of these scenarios has a number of distinguishing characteristics

Generator
1. Representation: Strings vs. Algebraic datatype vs. Quasi-quote
2. Automatic vs. Manual annotation
3. Static Generator vs. Runtime Generator
4. Homogeneous vs. Heterogeneous
5. Typed vs. un-Typed
  1. Statically Typed vs. Dynamically Typed
6. Two-stage vs. N-stage
Analysis
1. Homogeneous vs. Heterogeneous
2. Higher Order Abstract Syntax vs. First Order Syntax
3. Typed vs. un-Typed

Generators vs. Analyses

A program generator (a meta-program) solves a particular problem by constructing another program (an object-program) that solves the problem at hand. Usually the generated (object) program is "specialized" for the particular problem and uses less resources than a general purpose, non-generator solution.

A program analysis (a meta-program) observes the structure and environment of an object-program and computes some value as a result. Results can be data- or control-flow graphs, or even another object-program with properties based on the properties of the source object-program. Examples of these kind of meta-systems are: program transformers, optimizers, and partial evaluation systems.

Representation: Strings vs. Algebraic datatype vs. Quasi-quote

A meta-programming system uses program annotations (called staging annotations) to distinguish between the meta-program from the object program. An object program should be a first-class value. One should think of it as a datastructure that can be manipulated like any other. Many meta-systems represent object-programs by using strings, graphs, or algebraic data-structures.

With the string encoding, we represent the code fragment f (x,y) simply as "f(x,y)". While constructing and combining fragments represented by strings can be done concisely, deconstructing them is quite verbose. More seriously, there is no automatically verifiable guarantee that programs thusly constructed are syntactically correct. For example, "f (,y)" can have the static type string, but this clearly does not imply that this string represents a syntactically correct program.

With the datatype encoding, we can address the syntactic correctness problem. A datatype encoding is essentially the same as what is called abstract syntax or parse trees. The encoding of the fragment "f(x,y)" in an SML datatype might be: Apply(Variable "f",Tuple [Variable "x" ,Variable "y"]) using a datatype declared as follows:

datatype exp = Variable of string
             | Apply of (exp * exp)
             | Tuple of exp list

Using a datatype encoding has an immediate benefit: correct typing for the meta-program ensures correct syntax for all object-programs. Because SML supports pattern matching over datatypes, deconstructing programs becomes easier than with the string representation. However, constructing programs is now more verbose because we must use the cumbersome constructors like Variable, Apply, and Tuple.

The Quasi-quote representation is an attempt to over come this limitation. Here the actual representation of object-code is hidden from the user by the means of a quotation mechanism. Object code is constructed by placing "quotation" annotations around normal code fragments.

In the remainder of this document we will use the quasi-quotation notation of the MetaML meta-programming system. In MetaML quasi-quotation is just one of several staging annotations. In MetaML the staging annotations are Brackets < >, Escape ~, lift, and run. An expression < e > builds the code representation of e; ~ e splices the code obtained by evaluating e into the body of a surrounding Bracketed expression; and run e evaluates e to obtain a piece of code, and then evaluates this piece of code. It is important to note that ~ e is only legal within lexically enclosing Brackets.

As with the datatype representation of programs, using staging annotations guarantees the the syntactic correctness of object programs using the the type correctness of the meta-programs, but maintains the ease of construction of object-programs. With a staged type system (see below) the type correctness of meta-programs can also guarantee the type correctness (as well as the syntactic correctness) of object programs.

Automatic vs. Manual

We call a meta-programming system where the programmer places the staging annotations directly a manual staging system. If the staging annotations are place by an automatic process, then the meta-programming system is an automatic staging system.

Off-line partial evaluation is an automatic staging meta-system. Consider the simple partial evaluation function PE with type < s -> d -> a > -> (s -> <d -> a>)

PE takes the representation of a program with one static parameter and one dynamic parameter and returns an answer. It analyses this program and automatically produces a annotated program which when given the static parameter as input produces a representation of the function from dynamic parameter to answer. For example:

PE <fn s => fn d => d + (s + 3)>

evaluates to

fn s => <fn d => d + ~(lift (s+3))>

Automatic meta-programming systems save the user the bother of placing the annotations, but the user of an automatic system loses control over the structure of the output.

It has been argued that manually staged programs are hard to write, and are much larger than their un-staged counterparts that could be input into a PE system, thus saving the user a lot of work. With the advent of modern meta-programming systems with quasi-quote staging annotations it remains to be seen if this argument still holds. Our experience has been that manually annotated programs are (within a few percent) the same size as their unstaged counterparts.

Static Generator vs. Runtime Generator

Program generators come in two flavors: static generators, which generate code which is then “written to disk” and processed by normal compilers etc. and run-time code generators which are programs that write or construct other programs, and then immediately execute the programs they have generated. Examples of program generators are run-time code generation systems like the Synthesis Kernel, and static program generators such as Yacc.

Homogeneous vs. Heterogeneous

There are two kinds of meta-programming systems: homogeneous systems where the meta-language and the object language are the same, and heterogeneous systems where the meta-language is different from the object-language.

Both kinds of systems are useful for representing programs for automated program analysis and manipulation. But there are important advantages to homogeneous systems. Only homogeneous systems can be multi-level, where an object-program can itself be a meta-program that manipulates second-level object-programs. Only in a homogeneous meta-system can a single type system type both the meta-language and the object-language. Only homogeneous meta-systems can support reflection (where the meta-language is rich enough that every primitive on object-programs can be expressed as a meta-program). Only homogeneous meta-systems can support a "run" or "eval" operation in a uniform way. This is what makes run-time code generation possible. Homogeneous systems also have the important pedagogical and usability property that the user need only learn a single language.

Typed vs. un-Typed

To automate the meta-programming process and to make it less error prone, it must be possible to statically infer some properties of the object programs from the meta-programs that manipulate them. An important advance provided by the meta-programming system MetaML is a staged type system. A staged type system combines the important meta-programming information of staging and types into one system.

A staged type system states precisely the staging constraints of the system, as well as the typing requirements of each stage. One advantage of a staged type system is that meta-programs that manipulate ill-typed object-programs are themselves ill-typed. This provides great feedback to the writers of meta-programs.

Staged type systems are particularly appropriate for homogeneous systems, because only a single type system is necessary. Such systems capture the static versus dynamic parameter distinction of partial evaluation. Thus a program f with type: f :: a -> <d> -> < b -> <c> > tells us quite a bit about the program.

It says that f is a function with two parameters. the first is a static (known now) parameter of type "a", and the second is dynamic (not known until the second stage) parameter of type "d". This program produces an object program, which is also a (second-level) meta-program, with a single (second-stage) static parameter of type "b" which produces a (third-stage) object program of type "c".

Typed heterogeneous meta-systems are also important. Here the type of the meta-program somehow incorporates the types of the object language. This allows meta-programs to encode only type-consistent (at the object-level) transformations of object code, or type consistent translations from one object-language to another. In this case there are three type systems which must work harmoniously (the meta-level types system, and the two object-level type systems).

Statically Typed vs. Dynamically Typed

Two-stage vs. N-stage

A two-stage meta-programming system allows only a single meta-stage, and a single object-stage. In an N-stage meta-programming system any object-program can also be a meta-program.

Human-Machine Interface

In addition to these operational properties, a meta-system should also have a good human-system interface. Our experience with meta-programming systems has led us to formulate the following high-level meta-programming human-system interface desiderata:

Construction. It should be easy to construct code using some sort of pattern-based object-code templates. Templates should “look like” the object language they represent.
Splicing. Program fragments should be easy to combine into larger program fragments, this is best accomplished by a parameterizable splicing mechanism such as a ``template-with-holes".
Typing. In a homogeneous system, object-code has a parametric type, i.e., there is code with type Int, code with type Float, etc. Type correctness of the meta-program, should guarantee type correctness of the object-programs it constructs.
Hygiene. Bound variables in templates should be handled in some sophisticated way which guarantees no name clashes, and which obeys the rules of static scoping. Free variables in program templates should refer to the value of the variable at the static location where the template is defined, not where it is eventually executed.
Run. Object-programs can be run. Generated code can be “tested” inside the meta-system.
Printing. Object-programs can be printed. This is essential for debugging of meta-programs. The object-programs should be pretty-printed in the manner in which they are normally written by a programmer. They should not be represented by some abstract representation unfamiliar to the programmer.
Observation. Object-programs have structure. It should be possible to analyze object-programs, take them apart, etc.