|
Set: Monday 18th
April 2016
Code due: Wednesday
27th April 2016 at or before 13:00
|
|
Summary
Implement a lexical analyzer that takes as input a text
string and produces as output a list of tokens.
The input represents programs in a simple language of
identifiers, assignments and method requests.
The goals of the assignments are
- Use a FSA for lexing, with objects
representing the states. Delegate behavior to the
states using the state pattern
- Limit dependencies between the state objects.
- Make your code easy to read and as modular as
possible, because it is likely that the language you are
lexing will change.
- To practice test-driven design and development.
Lexical Structure
The languages comprises the following lexical elements:
- identifiers, which start with a letter and
contain letters, digits, underscore _, and dash '.
- numerals, which are sequences of digits and
contain at most one decimal point
- strings, which are sequences of characters
enclosed by " and ". Inside a string, \"
represents a quote, \n a newline, and \t a tab.
It's illegal to have a newline in a string.
- uninterpreted strings, which are sequences of
arbitrary characters enclosed by """ and """. Any
characters, including " and newline, can appear in an
uninterpreted string.
- dots, which are the separators in method
requests
- assignment symbols, :=
- definition symbols, =
- operators, which are sequences of the operator
characters -&|:$#%^@?*\/+!=><~
- parentheses ( and )
- brackets [ and ]
- braces { and }
- newlines, which can be represented by the
Unicode line feed (LF) character, by the Unicode
carriage return (CR) character, or by the Unicode line
separator (U+2028) character; a line feed that
immediately follows a carriage return is ignored.
Notice that some of these categories seem to overlap.
So, while := should be lexed as an assignment symbol, == and
:=: are operator symbols. Similarly, newlines are
treated as ordinary characters while inside an uninterpreted
string, but in other contexts are
Inputs that don't end with a newline should be treated as if
they end with a newline.
Input and Output
The input to your lex method should be a
String. The output should be a list of Tokens; the
tokens are defined in token.grace.
Pairing
If you want to work on this assignment as a pair (with
another student from the class), that's OK. In fact,
it's good! You will learn more, provided that you
actually pair program.
Conversely, it's not ok partner with someone else by
dividing up the modules between you so that each of you only
works on half of the code.
Tests
A tiny start at a testing framework is in lexer_test.grace.
Expand this into a full set of tests.
Hints
- Don't try and build a FSA for all of the above
categories at once. Start with a test for some of the
basic functionality, like lexing identifiers and
operator symbols. Once you have one test passing,
add another.
- You may want to write unit tests for behavior of some
of the internals of the lexer, like the behavior of a
particular state, or for the pattern of state
transitions. This will involve testing methods
that would normally be internal to the lexer. You
may need to make them visible just so that you can test
them.
- Don't write long methods. Use the composed method
pattern from Beck, both because I'm grading on
it, and because it will keep your code clean. If you
write more than 10 lines in a method, think about
decomposing it.
- As you learn more about the problem space, you will
realize that some of your early decisions were wrong.
Don't be afraid to go back and change them. Software is
soft! Embrace Change! I already changed my
mind twice about what the arguments of the token
creation methods should be. You may decide to
change them once again. That's OK; but explain why you
make any changes that you find necessary.
- Once a test is working, go back and refactor your code
to keep it clean. Saying "I'll clean up later" is
a great way to get into Technical Debt.
- Save your work frequently. Put your code in
version control (git, subversion, mercurial ... )
Submitting your work
Submit your work as a .zip file in the d2l
dropbox. You will need to tell me about the
composition of your group so that I can create the
dropbox. Your zip file should contain:
- a module A2Lexer.grace that contains your
implementation
- a module A2Lexer_test.grace that contains your
tests
- a file A2report.pdf (or A2report.txt)
that contains your report. Use this to tell me
what you learned, problems you solved, issues you ran
into and couldn't solve, features of Grace that made
this easier or harder than you expected,
etc. The report can be quite short; half a
page is probably ample. Write more here only if
you need to. The code itself and the
intention-revealing names you use in it should tell me
the details. If you feel the need to explain
your code in the report, stop! Go back and
clarify the code.
Resources
|