No Title

CS302 Spr'99 Lecture Notes
Lecture 9

Procedure Parameter Passing

$\begin{code}TYPE IARRAY IS ARRAY OF INTEGER; PROCEDURE f(x:INTEGER,y:INTEGER) IS... ... IARRAY, w: INTEGER . . . f(3,w); ... g(a,a); ... f(17+5,a[3]); . . . \end{code}$

$\bullet$ Do we pass addresses (l-values) or contents (r-values) of variables?

$\bullet$ How do we pass actual values that aren't variables?

$\bullet$ What does it mean to pass an aggregate value like an array?

Call-by-Value (i.e., r-value)

$\bullet$ Each actual argument is evaluated to a value before call.

$\bullet$ On entry, value is bound to formal parameter just like a local variable.

$\bullet$ Updating formal parameter doesn't affect actuals in calling procedure.

$\begin{code}double hyp(double a,double b) \{ a = a * a; b = b * b; return sqrt(a+b); \}\end{code}$
$\bullet$ Simple; easy to understand!

$\bullet$ Implement by copying.

Problems with Call-by-Value

$\bullet$ Can be inefficient if value is large.

Example: Calls to dotp copy 20 doubles:
$\begin{code}typedef struct \{double a1,a2,...,a10;\} vector; double dotp(vector ... ...w.a2 + ... + v.a10 * w.a10; \} vector v1,v2; double d = dotp(v1,v2);\end{code}$
$\bullet$ Cannot affect calling environment directly. (Often a good thing, but not always!)

Example: calls to swap have no effect:
$\begin{code}void swap(int i,int j) \{ int t; t = i ; i = j; j = t; \} . . . swap(a[p],a[q]);\end{code}$
$\bullet$ Can at best return only one result (as a value), with same efficiency problems.

Call-by-Reference (i.e., l-value)

$\bullet$ Pass the address (l-value) of each actual parameter.

$\bullet$ On entry, the formal is bound to the address, which must be dereferenced to get value, but can also be updated.

$\bullet$ If actual argument doesn't have an l-value (e.g., ``2 + 3''), either:

- Evaluate it into a temporary location and pass address of temporary, or

- Treat as an error.

$\bullet$ Now swap, etc., work fine!

$\bullet$ Accesses are slower.

$\bullet$ Lots of opportunity for aliasing problems, e.g.,

$\begin{code}PROCEDURE matmult(a,b,c: MATRIX) ... (* sets c := a * b *) \par matmult(a,b,a) (* oops! *)\end{code}$
$\bullet$ Call-by-value-result (a.k.a. copy-restore) addresses this problem, but has other drawbacks. Hybrid Methods

$\bullet$ Pascal, Ada, and similar languages allow the programmer to specify which arguments are to be handled as call-by-value, and which as call-by-reference. E.g., in Pascal, args are cbv by default, and cbr if the VAR keyword is used.

$\bullet$ In C, programmers can take the l-value of a variable explicitly, and pass that to obtain cbr-like behavior:
$\begin{code}swap(int *a, int *b) \{ int t; t = *a; *a = *b; *b = t; \} ... swap (&a[p],&a[q]);\end{code}$
$\bullet$ C++ combines both these options:
$\begin{code}swap(int &a, int *b) \{ int t; t = a; a = *b; *b = t; \} ... swap(a[p],&a[q]);\end{code}$
Mixing explicit and implicit pointers can be very confusing!

Records and Arrays

To understand argument passing of record and array types, must know what the language considers an r-value of these types to be!

$\bullet$ In Pascal, r-values of both arrays and records are the actual contents. So passing a record or array by value means copying the contents, whereas passing by reference (VAR parameter) doesn't.

$\bullet$ In PCAT, r-values of both records and arrays are pointers to the actual contents. PCAT has only call-by-value, but this doesn't actually cause copying, even for record or array values.

$\bullet$ In ANSI C/C++, struct r-values are the actual contents, but array r-values are pointers to the contents.

In this example, no doubles are copied on call:

$\begin{code}typedef double vector[10]; double dotp(vector v, vector w) \{ doubl... ...d += v[i] * w[i]; return d; \} vector v1,v2; double d1 = dotp(v1,v2);\end{code}$
Records and Arrays (continued)

To avoid copying C structures, must use pointers:
$\begin{code}typedef struct \{double a1,a2,...,a10;\} vector; double dotp(vector ... ...+ ... + v->a10 * w->a10; \} vector v1,v2; double d1 = dotp(&v1,&v2);\end{code}$

$\bullet$ These issue also affect assignment, of course.

Substitution

$\bullet$ Can often get the effect we want using substitution, i.e., macro-expansion, e.g (in C):
$\begin{code} ...$
$\bullet$ BUT blind substitution is dangerous because of possible ``variable capture,'' e.g.,
$\begin{code}swap(a[t],a[q])\end{code}$
expands to
$\begin{code}\{int t; t = a[t]; a[t] = a[q]; a[q] = t;\}\end{code}$
Here ``t is captured'' by the declaration in the macro, and is undefined at its first use.

$\bullet$ Really want ``substitution with renaming where necessary'' = Algol-60's call-by-Name facility.

$\bullet$ Flexible, but potentially very confusing, and inefficient to implement.

$\bullet$ If language has no updatable variables (as in ``pure'' functional languages), substitution gives a beautifully simple semantics for procedure calls.

Procedures as Parameters

It can be handy to pass procedures as parameters to other procedures. This feature is supported by Pascal, etc. and by C.

Example (pseudo-PCAT)
$\begin{code}TYPE INTLIST IS RECORD x: INTEGER; next: INTLIST; END; TYPE INT_A... ...INTLIST; do_intlist(a, print_int); do_intlist(a, sum_int); WRITE(sum);\end{code}$
Same Example in C

$\begin{code}typedef struct intlist \{ int x; struct intlist *next; \} *Intlist... ... just print_int */ do_intlist(a, &sum_int); printf(''\%d\\ n'', sum); \end{code}$

Using Local (Nested) Procedures

$\bullet$ Sometimes want to pass local functions as parameters.

Example: Improved version of sum:
$\begin{code}PROCEDURE sum_list(a:INTLIST) : INTEGER IS VAR sum := 0; PROCEDURE... ...sum_int); RETURN sum; END; VAR c: INTLIST; ...WRITE(sum_list(c));...\end{code}$

$\bullet$ Here sum_int operates on the value of variable sum, which is neither local nor global.

$\bullet$ Solution: pass pair of (code-pointer,static-link) as ``value'' of procedure.

$\bullet$ Must guarantee that static link is still valid when procedure is called!

$\bullet$ Cannot express this in C. More Nested Procedures

Example: Use iterator to count how many times specified integer occurs.

$\begin{code}PROCEDURE count(i:INTEGER;a:INTLIST) IS VAR sum := 0; PROCEDURE ch... ...st(a,check_int); RETURN sum; END; VAR c: INTLIST; ...count(17,c);...\end{code}$

$\bullet$ Here check_int depends on the value of variable i, which is neither local nor global.

$\bullet$ Going one step further, can be handy to treat procedure values just like other values, e.g., to return them as function results or store them into variables.

``First-class'' Procedures Example
$\begin{code}TYPE COUNTER IS PROCEDURE(b:INTLIST) : INTEGER; PROCEDURE make_count... ...ences of 17''); WRITE(''d has '', g(d), ''occurrences of 17''); END;\end{code}$
Applying First-class Procedures

Example: Table-driven Command Processor

$\begin{code}TYPE COMMAND IS ENUMERATION (SUMALL = 0, COUNT42S, COUNT101S, ...)... ...INTLIST) IS BEGIN WRITE (''Answer is '', actions[command] (a)); END\end{code}$

$\bullet$ Often handy for ``call-backs'' from operating system or window system. Problems with first-class procedures

Consider activation tree for make_counter example:
$\begin{code}main /\\ / \\ / \\ make_counter(17) g(c) == count(c) \vert ... ..._int) \vert \vert {\rm (requires value {\tt {i}} = 17)} check_int(j)\end{code}$
Activation of make_counter is no longer live when count is called!

If i is stored in activation record for make_counter and activation-record is stack-allocated, it will be gone at the point where check_int needs it!

To avoid this problem:

$\bullet$ Pascal prohibits ``upward funargs;'' procedure values can only be passed downward, and can't be stored.

$\bullet$ Some other languages only permit ``top-level'' procedures to be manipulated as procedure values (in C, this means all procedures!).

Heap Storage for Procedure Values

$\bullet$ Languages supporting first-class nested procedures (e.g., Lisp, Scheme, ML, Haskell, etc.) solve problem by using heap to store variables like i.

$\bullet$ Simple solution: Just put all activation records in the heap to begin with! (Garbage collection is a must!)

$\bullet$ More refined solution: Represent procedure values by a heap-allocated ``closure'' record, containing the procedure's code pointer and values of the non-local variables referenced by the procedure.

$\bullet$ Involves taking copies of the values of non-local variables, so only works when values are immutable.

$\bullet$ Can always introduce extra level of indirection to achieve this.

Functional Programming

What does functional mean? Two main senses:

$\bullet$ Programs consist of functions with no side effects.

$\bullet$ Functions are supported as ``first-class'' values.

Claim: functional programs are:

$\bullet$ clearer;

$\bullet$ easier to get right;

$\bullet$ easier to test;

$\bullet$ easier to transform;

$\bullet$ easier to parallelize;

$\bullet$ easier to prove things about.

Important examples:

$\bullet$ Lisp, Scheme (``strict'', dynamically typed, impure)

$\bullet$ Standard ML, CAML (``strict'', statically typed, impure)

$\bullet$ Haskell, Gofer (``lazy'', statically typed, pure)

ML Example: Dictionaries with Sorted Lists

$\begin{code}- type dict = int*string list; - fun insert (d:dict) (k:int,v:string... ...k andalso member t k); \par - val q = member b 4; val q = false : bool\end{code}$

Features

$\bullet$ Identifiers denote values, not changeable variables.

$\bullet$ insert is a function that returns a new list without changing its argument!

$\bullet$ Control structure is recursion, not iteration.

$\bullet$ Definition of lists is built into language; other similar data structures can be user-defined.

$\bullet$ List nodes (and other data structures) are tested and destructured using pattern matching. Why bother?

Functions can't have side-effects. Therefore, they can't have dangerous, hidden side-effects!

Consider this C fragment:
$\begin{code}insert(1,''q'',t); x = f(t); (* source not here... *) s = member(1,t);\end{code}$
Will s be true? It depends whether f modifies t!

Compare this functional code:
$\begin{code}val t' = insert (1,''q'') t val x = f t' val s = member 1 t'\end{code}$
Here s must be true, because f can't modify it's argument (or anything else)! Also compare:
$\begin{code}val t' = insert (1,''q'') t val (x,t'') = f t' val s = member 1 t''\end{code}$
It's testable.

$\bullet$ True functions can always be tested separately.

$\begin{code}fun appendx k t = let val v = find k t val t' = delete k t in insert (k,v ^ ''x'') t' end\end{code}$

If appendx is wrong, then its definition is wrong, or find or delete or insert must be wrong.

The problem can't be due to a hidden interaction between find, delete, and/or insert.

Any coupling between functions must be made explicit in their arguments or return values.

This helps discourage coupling!

(In principle, debugging by divide and conquer can even be automated. In practice, conventional trouble-shooting is much easier.)

Andrew P. Tolmach
1999-05-10