No Title

CS302 Spr'99 Lecture Notes
Lecture 4

INTERPRETERS

An interpreter for a language L is a program P_L' that given

$\bullet$ a description of a program Q_L (written in L), and

$\bullet$ an input I

behaves like Q_L on I.

L is the source language.

L', in which the interpreter is written, is the implementation language.

There are many possibilities for L' (including L itself!), but typically it will be a high-level language like C, Lisp, etc.

Important point is that P_L' is generic: it should work for any possible program Q_L.

Examples

(Note: any language may be interpreted, but some usually are.)

BASIC, Pascal (via PCODE), Unix shells / PERL / Awk / TCL, Java, etc.

Pros and Cons

+ Easier to write than compiler; leverages high-level features of implementation language.

+ No compilation time overhead for users of source language; code-test-debug can be much quicker.

+ Portable, assuming implementation language is.

+ Provides a semantics, relative to implementation language.

- Interpretation is slower than running compiled code, mainly because decoding and dispatch are done in software, and because (ordinarily) very little optimization is done.

Continuum of possibilities between source interpretation and translation to machine code:

$\bullet$ Many systems translate source language to some intermediate language and then interpret that.

$\bullet$ Hardware processors can be viewed as ``interpreting'' machine code instructions (esp. if hardware is microcoded).

$\bullet$ Can build-special purpose hardware processors for specific languages (e.g., LISP machines, Java chips).

Defining Interpreters Using Attribute Grammars

Like other language processing, convenient to define interpreters using grammatical syntax framework.

As first step, can define interpreter using an attribute grammar.

Approach similar to semantics definitions, but instead of computing a translation (to machine code, functions, etc.), actually compute the value of the program within the grammar.

Thus we use a.g. formalism as our ``implementation language.''

(Next step will be to encode the a.g. into a ``real'' language, like C.)

Need to use both synthesized and inherited attributes. Review of Attribute Grammars

Attribute Grammars (a.k.a. ``syntax-directed definitions'') allow convenient, concise definition of calculations on recursive structures.

Calculations are specified by describing their local behavior at parse tree nodes; structure of parse tree defines global shape of computation.

$\bullet$ Attach rules (a.k.a. ``attribute equations'') to grammar productions.

$\bullet$ Rules compute attribute values at corresponding parse tree node based on attribute values at - parent nodes (inherited attributes), and/or - child nodes (synthesized attributes)

Example:
$\begin{code}\cdmath A := B C $\uparrow$\space A.syn := ... B. .... C. .... $\do... ...w$\space B.inh := ... A. ... $\downarrow$\space C.inh := ... A. .... \end{code}$

$\bullet$ Terminals may have ``built-in'' attributes (think of them as being synthesized automatically).

$\bullet$ Types of attributes and form of rules vary widely.

A.G. Definitions are ``Self-Checking''

Must remember to define all needed attributes.

$\bullet$ If S.inh is an inherited attribute, it must be defined each time S appears in any grammar production right-hand side.
$\begin{code}\cdmath T := S$_1$\space S$_2$\space $\downarrow$\space S$_1$ .inh := ... T ... $\downarrow$\space S$_2$ .inh := ... T ... \end{code}$

$\bullet$ If S.syn is a synthesized attribute, it must be defined each time S appears as a grammar production left-hand side.
$\begin{code}\cdmath S := T$_1$\space T$_2$\space $\uparrow$\space S.syn = ... T$... ...ow$\space S.syn = ... U$_1$\space ... U$_2$\space ... U$_3$\space ... \end{code}$

Functional Attribute Grammars

Life is much nicer if we restrict the right-hand sides of attribute rules to be pure functions, i.e., calculations with no side-effects, because then

$\bullet$ The ``result'' of evaluation is just the value of the root node's synthesized attributes.

$\bullet$ Evaluation can occur in any order consistent with data dependencies among attribute rules.

Must avoid circularities in rules, e.g.:

$\begin{code}\cdmath A := B C $\uparrow$\space A.x = B.x + 10 $\downarrow$\space... ...par B := D E $\uparrow$\space B.x = if D.flag then B.y + 2 else E.z \end{code}$

Precise definition of circularity can be subtle.

Simple Expression Language with Local Binding

$\begin{code}\cdmath \par prog := exp \par exp := NUM exp := VAR \par exp := ex... ...'*' exp$_2$\par exp := LET VAR '=' exp$_1$\space IN exp$_2$\space END \end{code}$

Example:

$\begin{code}\cdmath \par let a = 2 + 5 in 14 + let b = a * 3 in b + 7 end end \par$\Rightarrow$\space 42 \end{code}$

Attribute Grammar for Interpretation

$\begin{code}\cdmath prog := exp $\downarrow$\space exp.env := empty $\uparrow$... ...nv, VAR.var, exp$_1$ .val) $\uparrow$\space exp.val := exp$_2$ .val \end{code}$

Attribute Grammar (Cont.) Attributes:

Terminal NUM has .num attribute (number)

Terminal VAR has .var attribute (string)

Terminals '+','-',let,'=',IN,END have no attributes.

Non-terminal exp has - inherited env attribute (dictionary)

- synthesized val attribute (number)

A dictionary is a (functional) abstract data type supporting the following primitives:

$\begin{code}\cdmath empty: dictionary lookup: dictionary $\times$\space string $... ...$\space string $\times$\space number $\rightarrow$\space dictionary \end{code}$

Imperative Evaluation Strategies

Functional attribute grammars have nice properties, but can make it awkward to deal with imperative features of languages, such as input/output and assignment statements.

Alternative: fix the evaluation order of attributes, so that we can safely include imperative statements (side effects) in the ``attribute equations'' section.

Default order: depth-first, left-to-right, but must obey data dependencies.

$\bullet$ First evaluate children's inherited attributes.

$\bullet$ Then recursively evaluate children, obtaining their synthesized attributes.

$\bullet$ Finally evaluate own synthesized attributes.

$\bullet$ Can perform side-effects at any point specified (no standard way to express this, though.)

Example

Add variable update (via an update primitive for the dictionary ADT) and printing (via a write primitive).

$\begin{code}\cdmath prog := exp $\downarrow$\space exp.env := empty $\uparrow$... ...p.env,VAR.var,exp$_1$ .val) $\uparrow$\space exp.val := exp$_1$ .val \end{code}$

Implementing Imperative Attribute Grammars

It is easy to turn imperative attribute grammars into recursive descent C programs that process tree data structures. (Type-checker was one example.)

$\bullet$ Each nonterminal N gets corresponding C function N.

$\bullet$ Inherited attributes of N become extra arguments to the function N

$\bullet$ Synthesized attributes of N become return values from the function N.

$\bullet$ Follow evaluation order described previously.

$\bullet$ Side effects are executed wherever encountered.

C version of Example

First the data structure:

$\begin{code}typedef char* id; typedef struct ExpS *Exp; struct ExpS \{ enum \... ... v; Exp e1, e2; \} let; struct \{ id v; Exp e; \} assign; \} u; \}; \end{code}$

Assume suitable operations on environments and I/O:
$\begin{code}typedef ... Env; static Env empty; int lookup(Env, id); Env extend(Env,id,int); void update(Env,id,int); \par void write(int); \end{code}$

C version (Cont.)

The actual evaluation code:

$\begin{code}int eval(Env env,Exp exp) \{ switch (exp->kind) \{ case Num : retu... ...exp->u.assign.e); update(env,exp->u.assign.v,v); return v;\} \} \} \end{code}$

Andrew P. Tolmach
1999-04-07