Declarations and Expressions

An ML program is essentially a collection of declarations and expressions. (We can think of the ``main'' program being some particular declaration or function of interest, which makes use of the other declarations.)

Declarations bind identifiers to values, e.g.,

code206

or to functions, e.g.,

code209

Both values and function bodies are specified as expressions.

Every expression and identifier has a type. We can (but almost never need to) associate types explicitly with expressions or identifiers using the notation :type.

The main kinds of expressions are as follows:

tex2html_wrap_inline227 Constants, e.g.,

code52

tex2html_wrap_inline227 Identifiers (``variables'').

code54

Expressions (continued)

tex2html_wrap_inline227 Constructor applications

code57

tex2html_wrap_inline227 Function and operator applications

code59

tex2html_wrap_inline227 Let-bindings

code61

tex2html_wrap_inline227 Conditionals and case expressions

code63

tex2html_wrap_inline227 Anonymous functions

code212

Evaluation

ML programs execute by evaluating expressions into values.

Since ML is an ``eager'' language, an expression is evaluated whenever:

tex2html_wrap_inline227 it is bound to an identifier in a declaration; or

tex2html_wrap_inline227 it is specified as an argument to a function or operator application; or

tex2html_wrap_inline227 it is specified as an argument to a data constructor; or

tex2html_wrap_inline227 it is subjected to a conditional or case operation; or

tex2html_wrap_inline227 it is being returned as the value of a function.

Where there's a choice (e.g., in evaluating the arguments to a pair expression), SML always evaluated left-to-right.

The only places an expression isn't evaluated ``immediately'' are when:

tex2html_wrap_inline227 it appears as the body of a function (it is evaluated only when the function is applied); or

tex2html_wrap_inline227 it appears as one arm of a conditional or case expression (it is evaluated only if that arm is selected).

What does it mean to evaluate an expression?

Let-bindings

Declarations can appear:

tex2html_wrap_inline227 at top-level, where they become globally visible in all subsequent top-level entries; or

tex2html_wrap_inline227 in let expressions.

The purpose of let is to restrict the scope of a declaration to a limited part of the program.

Often used in functions just like ``local variables'' in other languages:

code73

But a let expression can appear anywhere an expression can:

code77

A let expression is always evaluated by evaluating the RHS of the declaration to a value, binding that value to the declared identifier and then evaluating the body of the expression.

Declaration Lists

A sequence of declarations following the let keyword is evaluated in order, just as if it were a nested sequence of lets, e.g.:

code83

is just like:

code85

When we begin to consider recursion, things will get a little more complicated.

Scope in the Interactive System

Declarations entered at the read-eval-print loop are semantically like nested let declarations, where the scope of the declaration extends indefinitely (until the end of the interactive session). E.g.,

code89

is equivalent to

code91

Or, more compactly:

code93

Cases and Conditionals

A case expression allows a value of a constructed type (pairs, lists, etc.) to be analyzed into its components.

Each rule of a case is specified by giving a pattern and an expression to be evaluated if that pattern matches the data value being ``cased over''. The pattern specifies:

tex2html_wrap_inline227 Which data constructor was used to construct the value

tex2html_wrap_inline227 Identifier names to be bound to the subcomponents of the value

code100

The first rule matches list values constructed with :: (i.e., non-empty ones), and binds the head element of the list to x and the remainder of the list to y. These variables can then be used on the right-hand side of the matching rule (only).

The second rule matches list values constructed with nil (i.e., empty ones); since such values have no sub-components, there are no variables in the pattern.

We'll see much more sophisticated patterns later.

Conditionals; Derived Forms

Conditional expressions analyze a boolean-valued expression and, depending on the outcome, evaluate one of two sub-expressions:

code108

Note that both then and else expressions must always be specified in order that the if expression is given a value as a whole.

In fact, the if expression is really just a (syntactic) shorthand for a case expression over the boolean type. E.g.,

code116

This is an example of a derived form: a piece of source-language syntax that is defined (in the language reference manual) by macro-expansion into core language syntax.

You shouldn't need to worry about whether some syntax is core or derived. Unfortunately, you sometimes do, because compiler error messages are reported in terms of the core syntax translation.

Conjunction and Disjunction

The boolean operators andalso (not and !) and orelse are also derived forms for case expressions, e.g.,

code125

This definition makes it clear that andalso is a ``short-circuiting'' operator; tex2html_wrap_inline285 is evaluated only if tex2html_wrap_inline287 is known to be true; so is orelse.

We'll shortly see that most operators in ML are just like functions. Why can't andalso and orelse be?

Function Application and Scope

As in other languages, function applications are evaluated by:

tex2html_wrap_inline227 evaluating the actual argument;

tex2html_wrap_inline227 binding the resulting values to the formal parameter of the function;

tex2html_wrap_inline227 evaluating the body of the function with those bindings in effect; and

tex2html_wrap_inline227 returning the value of that body expression as the function result.

code132

What if a function body mentions a non-local identifier (a free variable)? Such an identifier must be in scope, with a value, at the point where the function is defined:

code136

A Crucial Fact

Only the values is scope at the point of the function definition matter. Subsequent redefinitions of the the identifier are irrelevant!:

code139

code141

Operators

The standard arithmetic, string, and logical operators are just ``built-in'' functions; they generally obey all the same rules as user-defined functions.

Most of the standard binary operators are infix, so that you write them between their operands, rather than in usual prefix style for function application, e.g.

code215

instead of

code218

Actually, any function can be made infix by an appropriate declaration, and its associativity and precedence can also be declared. E.g., the ``built-in'' declarations for the integers are:

code150

Once an symbol is declared infix, it can be still be used in a prefix fashion by adding the keyword op, e.g., can write

code221

Anonymous functions

fun declarations bind functions to names. ML also has function expressions which allow you to define anonymous functions.

For example:

code224

is the (anonymous) function that adds 1 to its argument.

We can use a function expression wherever a function name would make sense, e.g.,

code160

or, a little more curiously,

code162

Naturally, function expressions can also be bound to names, e.g.,

code164

Anonymous functions (continued)

In fact, the usual function declaration syntax (using fun)

code168

is just a derived form for the fn binding above.

Note: fn tex2html_wrap_inline297 fun !!

We'll see good uses for anonymous functions later.

Meanwhile, what should it mean to evaluate a function expression?

Clearly, the body of the function should not be evaluated (this happens only later, when the function is applied).

One thing that does happen (in the underlying implementation) is that the values of any free variables in the body are recorded for use when the function is later applied. You don't need to worry about this explicitly, but it can be handy to remember when you're trying to figure out how free variables work!


Andrew P. Tolmach
Thu Apr 10 18:49:31 PDT 1997