CS321 Prog Lang & Compilers                                Project #2
Assigned: Feb 19, 2007                             Due: March 5, 2007


MINI Parser


In this assignment you will use ML-Yacc, a parser generator tool, to
automate the task of building bottom-up parse trees for MINI programs.

The Project has two parts.
A) Getting the parser to work
B) Adding the actions to build a parse tree.

You will turn in a ml-yacc file (.grm file) and necessary boiler plate
files to tie everything together.

*** Overview:

ML-Yacc uses context-free grammars to specify the syntax of
languages to be parsed.  To evaluate the syntax of a program,
the parser operates on a stream of tokens that are generated by
the lexical analyzer.  You will use your lexical analyzer
designed in Project 1, or the one supplied by me for this
assignment. There is one small change, and that is that the .grm
file supplies the Token datatype (see below). The lexical
analyzer supplied by me will be available later this week, and I
will mail it to you on request.


*** General Process Steps:


1) ML-Yacc documentation can be found at
http://www.cs.princeton.edu/~appel/modern/ml/ml-yacc/manual.html


PART A.

2) Transcribe the grammar from the general project handout
to a ml-yacc file. You will need to add terminal and non-terminal
information as described in class.

3) You will need to use MINI's precedence and associativity
rules to remove ambiguity in the expression productions. You
do this by adding precedence rules to the .grm file.

4) Once your specification file is ready,


5)  Write a driver (in ML) by using the boiler plate files
http://www.cs.pdx.edu/~sheard/course/Cs321/LexYacc/boilerplate/

    a) You must use the compile-manager for project 2.

    b) You must connect your lexer and parser together
       this is what the boiler plate files do. In particular
       the Token datatype is defined by the parser in
       the .grm file. See the examples and the discussions
       in the notes from class.

*** Error-Reporting Requirements:

6) On a syntactically correct program, the program prints a message
stating that the file is a valid mini program

7) On a syntactically incorrect program, it prints an error message
stating where the error occurred. The use of exceptions will be
useful here.

8) Note that certain constructs in the datatypes have space
for "loc" locations which is a (line,column) pair of integers.
Besure and place these things correctly. The lexer should supply
these with every token.


PART B.

The second part of the project is to add grammar rules
that build a parse tree. You should do this only AFTER YOU
GET PART A to work. A few notes


A) The Assign construct contains room for a string and
a Basic Type that will be provided later by the type checker.

    Assign of (Exp*string) option * Id * (Exp*Basic) option * Exp

The parser can use the empty string and Int for every Assign
built and we will fill in the real values later. For example:

p.x=5     (Assign(SOME({p},""),{x},NONE,{5})
x[2]=5    (Assign(NONE,{x},SOME({2},Int),{5})
p.x[2]=5  (Assign(SOME({p},""),{x},SOME({2},Int),{5})
x=5       (Assign(NONE,{x},NONE,{5})

B) Any sub-tree of type (a TC) is to be filled in later
by the typechecker and, the parser should fill in NONE.
ArrayElm of Exp * Exp * (Basic TC)

x[3]     (ArrayElm({x},{3},NONE))

A Parse tree is a value of type Program, which is an ML datatype.
You must use the datatypes specified below. Which is also
conviently packaged in a file called "ProgramTypes.sml" which
packages them in a structure called ProgramTypes. This file
can be downloaded from the projects page.

--------------------------------------------------------------

structure ProgramTypes = struct

exception LexicalError of string * (int * int);
exception ParseError of string * int;

type Id = string;

(** Representing types for mini-Java **)

type loc = (int * int)

datatype Basic = Bool | Int | Real;

datatype Type
  = BasicType of Basic
  | ArrayType of Basic
  | ObjType of Id
  | VoidType;


(* A slot of type (`x TC) is an option *)
(* type. The parser places NONE there  *)
(* and the type-checker fills it in    *)
(* with (SOME x) when "x" is known     *)

type 'x TC = 'x option;

(******** Representing Programs *******)


datatype BINOP = ADD | SUB | MUL | DIV        (* Arithmetic *)
               | AND | OR;
datatype RELOP =  EQ | NE | LT | LE | GT | GE;

datatype Constant         (* Literal constants *)
  = Cint of string
  | Creal of string
  | Cbool of bool;

datatype Exp
  = Literal of loc * Constant           (* 5, 6.3, true *)
  | Binop of BINOP * Exp * Exp          (* x + 3        *)
  | Relop of RELOP * Exp * Exp          (* x < 7.7      *)
  | Not of Exp                          (* ! x          *)
  | ArrayElm of Exp * Exp * (Basic TC)  (* x[3]         *)
  | ArrayLen of Exp                     (* x.length()   *)
  | Call of Exp * Id * Id TC * Exp list (* x.f(1,z)     *)
  | NewArray of Basic * Exp             (* new int[3]   *)
  | NewObject of loc*Id                 (* new point()  *)
  (* Coerce is used only in type checking               *)
  | Coerce of Exp
  | Member of loc * Exp * Id * Id
  | Var of loc * Id
  | This of loc


datatype Stmt
  = Block of Stmt list       (* {x:5; print(2)}           *)
  | Assign of (Exp*string) option * Id * (Exp*Basic) option * Exp
                             (* p.x[2]=5  p.x=5  x=5      *)
  | CallStmt of Exp * Id * Id TC * Exp list
                             (* x.f(1,z)                  *)
  | If of Exp * Stmt * Stmt  (* if (p<2) x=5 else x=6     *)
  | While of Exp * Stmt      (* while (p) s               *)
  | PrintE of Exp            (* System.out.println(x)     *)
  | PrintT of string         (* System.out.println("zbc") *)
  | Return of Exp option;    (* return (x+3)              *)

datatype VarDecl = VarDecl of loc * Type * Id * Exp option;

datatype Formal = Formal of Type * Id;

datatype MetDecl = MetDecl of loc * Type * Id *
                   Formal list * VarDecl list *
                   Stmt list;


datatype ClassDec
  = ClassDec of loc * Id * Id * VarDecl list * MetDecl list;


datatype Program = Program of ClassDec list;


end (* ProgramTypes *)
----------------------------------------------------------

1) On a syntactically correct program, return the syntax
tree built by the parse.

2) On a syntactically incorrect program, you should catch any
exceptions raised, and die gracefully by printing out an error
message describing the exception.


*** Additional Resources:


1.  Text book Chapter 3.
3.  YACC in the C world: http://dinosaur.compilertools.net/yacc/index.html


*** What and How to Turn in Your Program:


1. submit your ML-Yacc specification, boiler plate programs, and
   grammar file electronically to: sheard@cs.pdx.edu


   a) The subject line should be P2Last-name, so if your last name is
      "Jones" the subject line should only contain "P1Jones"


   b) All files should be attachments, and they should be named
      using the same convention used for the subject line.

   c) PLEASE! Only submit a solution once. So be sure you have
 everything the way you want it before you submit.