Remember to submit the project to cs321-01@cs.pdx.edu. Tolmach has office hours MW and the tutor has office hours TTh.

Questions about Homework 1

ENV class

env is a linked list of variables. A new variable can be added to the head of the list, but pointers may still exist to any point in that list
ValueEnv - is just like an Env class, except that it has a string ("+") as well as an associated value. It makes sense to create ValueEnv as a subclass of Env.

Class Lecture

Slide 2: Lexical Analysis

content-free characters make code readable to humans, but it is largely irrelevant to the compiler. The few exceptions are old languages, where formatting is relevant to program flow.
When we get to syntax analysis, we will see that it is scrictly more powerful than lexical analysis... So, why bother with lexical analysis when we can do it all with syntax analysis?

Slide 3:Lexical Analysis Example

.
  The patterns are not mutually exclusive. then could be an identifier or a THEN token, so we assume that a definition higher on the list takes precedence over one lower. We also need to look at the language grammar to determine how to interpret thenp.
What problems could we have with just throwing away white space?
  A lexeme is a piece of the input that gives us an argument.

Slide 5: Stream Interface

The easiest way to write a lexical analyzer is as a stream to stream model, remembering that historically memory was expensive, so compilers were not designed to remember the contents of the source files. File Reader then passes character-by-character to the lexical analyzer, which them passes token-by-token to the parser. 
The unread() function is useful to return already read characters back to the file stream.
getToken returns a class, which holds the type of token (IF, STRING, GT, ID, etc)  and the attribute ("x", 3, "foobar!", etc).
What about errors? What happens if there is a lexical error. such as a dollar sign in the identifier. In Java there is an easy interface, which is the Exception. We have all hit compilers that don't stop when they hit an error, whether we wanted it to or not. The question that begs to be asked is, how do we recover from an error in that situation?

Slide 6: Hand-coded Scanner (in Pseudo-Java)

inside the digit read loop:
do {n = n * 10 + (c- '0'); c=read();}
This works because all the numbers are in order, so removing the ansi value of '0' from the ansi value of the current character is the integer value of that digit.
In many cases the tokenizer is the slowest part of hte lexical analyzer because it is the only part of the program that has to look at every character of the source file. This is a very efficient way to implement this process, but it is also really easy to get wrong because of all the points where input happens with read() and unread() invocations. There are also output instructions (return...) and pattern matching (is whitespace, is not a digit, is a digit, etc) and conversions (n = n * 10 + (c- '0');).

Slide 7: Example

note: "any number of" includes zero.

Slide 8: Regular expressions

Defines sets of strings, of which there are infinite quantities.

Slide 9: Lanuages: Some preliminary definitions

Why is it so importand tht L^0 is the empty string rather than the empty set?
  Because it is not empty... it has one character.
  What would happen if L^0 was the empty set?
  Raising it to the i would still prduce an empty set and we could not evaluate it.

Slide 10: Recular expressions and languages

Remember! {a} is the set of strings containing a. Languages are sets of strings, not sets of characters.
dot = concatenation
    | = alteration (or)

Slide 11: Regular expressions

(a|b)* = (a*b*)*, which tells us that the regular expression can be different even if the set is identical.

Slide 12: Regular definitions

This is a way to give names to regular expressions. We must be careful not to go beyond our powers of creating syntactical sugar by creating recursive definitions.
	GOOD -> e|aGOODb    == (a^nb^n|n>=0}
	This is illegal because we can never expand it to the point of being an ordinary regular edpression. Each expansion leaves another instance of GOOD.
	
	This is good for Syntactical analysis, but not for lexical analysis

Slide 13: Specifying lexical analyzers

A lexeme is a piece of input that matches patterns (see slide 3)
	We can specify the behavior of an analyser in this way. We can generate this code by hand or we can use a meta to generate this code with regular expressions for us.