Term Project
This page provides details about the term project
component of CS350. The purpose of this project is to
give each student an opportunity to do some in-depth
practical work on the design, implementation, and
analysis of algorithms.
Important: Although you will have
approximately one month to work on the project, you
are required to submit a project proposal by 5th November to
receive credit (details
below).
General Description
The theme of the project is to implement one or more
of the algorithms that we have read about or studied
during this course, and to run tests to determine
whether the behavior that you see in practice agrees
with the predictions of the formal analysis. More
specifically, your project will explore one of the
following three topics:
-
Convex Hull: Implement the brute force and
the Quickhull algorithms for convex hull.
Compare their time efficiency empirically, and see
how well these empirical results conform to what
analysis predicts.
-
Dictionaries: Implement a dictionary using
various kinds of hash table and various kinds of
tree. By “dictionary” I mean a data
structure that maps keys (such as strings) to
values (such as data records), not a lexicon of
meanings for English words. See how these
data structures compare, for both space and time,
and compare your empirical results with those
predicted by analysis. Your data should include
situations where entries are deleted as well as
where they are added; for static data, good old
binary search is hard to beat!
-
Good Sorts: Implement some of the
asymptotically-good sort algorithms, that is,
those that are in Ω(n lg n) on
average. Run them on a wide range of
examples to see if the formal analysis is a good
predictor of each algorithm's behavior in
practice, to see how the various
improvements change their behavior, and how their
space requirements compare.
If you have a burning desire to work on something
other than one of the three project ideas listed
above, please check with me and I'll be happy to give
you my thoughts and a green light to proceed if I
think the idea is appropriate and
interesting. If you want to read ahead and
select some of the more-advanced algorithms from the
textbook, that's fine.
You may use any reasonable programming language or
platform for your implementation; please check with me
if you have any concerns about your specific choices
in this area.
Collaboration
I'm strongly encouraging students to form teams
of two or three. Each team will submit
single report. I will not accept larger
teams; experience has shown that the coordination
problems of a larger team outweigh the technical
problems, and its on the latter that I want you to
focus your energies. Students who have a special
need to work on their own should ask for permission to
do so.
Team projects will enable you to investigate your
topic in greater depth—for example, with more thorough
testing and analysis, or with a broader range of
algorithms. Generally, I believe that you learn
more, and understand more deeply, by working with a
partner or two, for example, by pair
programming.
What won't work is a project in which you try
to divide the work among team members and then hope to
put the pieces together at the end. Such
projects are doomed! Please don’t try
this. Plan to work as a team; this means meeting
in person.
If you have had prior success working remotely with
partners, using screen-sharing and voice chat, you may
want to try this. My own experience is that
nothing beats sitting together at the same keyboard.
Project Report:
At the end of the project, you will submit a written
report that will be used as the sole basis for
evaluating your project. Specifically, your
report will be expected:
- To show that you understand and can implement
standard algorithms (10%)
- To show that you can write programs that are
understandable, and algorithmically sound (30%)
- To provide evidence that you understand how
complexity theory shows up in practice (15%)
- To demonstrate your initiative, originality, and
algorithmic insights (15%)
- To show that you can communicate your work clearly
and concisely in a well-structured document, which
should communicate clearly the purpose of your
project, the
experimental procedures, your results and
your conclusions.
(30%)
- For group projects, I expect a single group
report. I also expect a one-page write-up
from each individual, describing that
individual's role in the project, the role taken on
by the other team members, and how well the
collaboration followed the plan.
Specific items that may appear in a report include:
- Descriptions of the algorithm(s) that you are
working with, in your own words.
- Source code for particularly intricate or
interesting parts of the algorithm.
- Worked examples of your own
devising to show how the algorithms work (This
means that you are not to copy the ones in
the book, the slides, the original papers where the
algorithms were introduced, or resources authored by
other people).
- Details of the testing strategy
that you have used. This is likely to involve the
construction of code to generate large
pseudo-random test cases, and code to verify that
the results produced by your code are correct.
- Summaries of experimental data (e.g.,
tables or graphs showing the algorithms' behavior
over a range of different inputs).
- Reflections on what you have learned
as a result of your experience.
The rubric that I will use to grade the project
reports is here.
Deadlines, Project
Proposal
The project final report will be due by the start of
class on Thursday 14th March which is the last
scheduled class for CS350 before finals. However, you
are expected to submit a “proposal” for your project,
by 18:00 on Thursday 21st February which is worth five
percent of the overall score for the class. You will forfeit those
points if you do not submit a proposal by this time;
there will no extensions for project proposals
(except in case of documented illness). Please
submit using d2l.
My advice is to submit your proposal early,
using d2l,
since that way you will get feedback earlier. This
is especially important for students who propose a
custom project (something other than Convex Hull,
Dictionaries, or Good Sorts). You can also discuss
custom projects with me informally on Piazza or in
office hours.
Your proposal should identify:
- the name(s) of the student(s) working on
the project;
- the choice of project topic;
- the implementation language;
- a list of the specific features that you
expect to include in your final report; and
- a time plan that identifies at least 3
specific goals for each of the remaining 3 weeks of
the term.
- A collaboration
plan that describes how your team intends
to work together.
The rubric for grading the project proposals is here.
In addition, although they are not required, I
encourage you to include additional preliminary
materials with your proposal (e.g., in-progress
implementations, testing code); I will review these
materials and provide feedback.
Advice
Here are some general comments and thoughts that I
hope will help you to focus your time and efforts
where they are most effective.
Read the Rubrics! Points are awarded
according to the rubrics, so if you miss
something that's required, you won't get points for
it. Conversely, if you do something that's not
on the rubric, you won't get points for it.
Budget your time.
The most common cause of failure in the project is
running out of time. A day this week is worth
just as much as the day before the deadline! With that
in mind, you are very strongly encouraged to make a
substantial start on the project immediately.
I won't have any way to check that you’ve taken my
advice on this, but you will put yourself at a huge
disadvantage if you do not get started right away.
Effort where it matters most: The grading
scheme for the project/term paper will be based on the
items listed above in the section about the project
report, so you should follow that, and pay attenton to
the rubric, as you prepare your proposal and your
report. For example, if your final project report does
not include a significant component illustrating “how
complexity theory shows up in practice”, then you will
miss out on the points that are allocated for that
item — even if the overall quality of your work is
very high. For the same reason, you should avoid
spending too much time on details that aren’t going to
score you points. For example, writing code that
provides sophisticated ways for entering test cases or
viewing results might help in debugging or
understanding the behavior of your implementation.
However, if it doesn’t relate fairly directly to the
items in the grading scheme, then that code will not
contribute much to your final grade.
Automated testing: You’ll want to run your
implementations on a lot of test cases so that
you can get a good idea of the performance of the
algorithms over a wide range of inputs and input
sizes. For example, you may be sorting lists
that contain millions
of values as you compare sort algorithms. So,
you’ll likely want to write some code for generating
test cases automatically, and for checking that your
algorithms are working correctly. There might even be
more opportunity for demonstrating originality and
initiative in the methods you devise for generating
and checking test cases than in any other part of the
project. (After all, I'm not asking you to be original
with your algorithms.) It is often much easier to get
an understanding of general trends in program behavior
by running a large number of tests automatically than
by running just a few examples by hand.
Automated management
of results: As you run tests, you will
accumulate lots
of data. It’s easy to loose track of it.
Consider writing results to (systematically-named)
files, and making sure that the provenance of the data
is clear: the date and version of the code, where the
input data came from, etc. Also consider writing the
files in a form (such as tab-separated text) that can
be opened by a spreadsheet without any additional hand
processing.
Measurements:
You’ll likely want to find a method to measure
execution times as part of your program instead of
having to rely on manual readings from a watch or
external script. Have you found out how to access a
clock or timer from whatever programming language you
are using? If the timer that you use doesn’t
have a very high resolution, then you might want to
divide the time that it takes to run the same test k times (for some
large enough k) by k to obtain a more
accurate measurement for a single test run. In
other words, time 10 or 100 rounds, and then divide
the time by 10 or 100.
Throw one (or more)
away: The results from the first (and
sometime the first few hundred ...) runs of an
algorithm are often atypical. The implementation
may have been compiling or loading code, warming
caches, or doing other housekeeping tasks. Time
the first few runs separately from those that
follow. Does the time start high, decrease, and
then level off? You may well decide to “throw
away” those results. This is especially true for
language implementations that do “Just in time”
compilation: you want to time your algorithm, not the
JIT Compiler.
If you are measuring elapsed time, you may also find
that most runs take the same time, but a few outliers
take significantly longer. Think why this might
be. Are they reproducible? What do these
outliers tell you about the algorithm?
Questions?
In spite of the details here, the project component
of CS350 is still rather open-ended. Please do not
hesitate to ask if you have any questions or need more
guidance or input. Piazza is the best venue, so
that others can benefit from the answers.
Most
recently
modified sometime in the
past
Andrew P. Black