Project 2

Spring 2011  CS 410/584 Algorithm Design & Analysis

Overview

The goal of this project is to try out two different string-matching approaches on several kinds of data.

 

This assignment can be done in pairs, in any programming language you wish, but each person must turn in his or her own independent write-up, described below. The due date for this assignment is Thursday, 26 May.

Algorithms

You should implement two different algorithms. You can use whatever programming language you wish.

 

Knuth-Morris-Pratt: One of the algorithms to implement is the Knuth-Morris-Pratt algorithm discussed in class.

 

Right-to-Left: You should implement a second algorithm that uses right-to-left matching of the pattern to the text string, along with a shift function for advancing the pattern under mismatches. This technique is used in the Boyer-Moore matching algorithm, which is not discussed in the textbook but is available in other sources.

 

Both algorithms should keep a count of comparisons made.  This count should only include pattern-to-text comparisons, and not comparisons made in pre-processing the pattern.

 

Each algorithm should output the all the positions at which the pattern matches the text, or “no match” if there are none. Each should also list the number of comparisons made.

Data and Test Runs

Each test set consists of a 1000-character text string, and a file of 5 patterns, one per line. You should run each algorithm on each text string with all five corresponding patterns.

 

Test set #1, binary string: text    patterns

Test set #2, random ASCII:  text    patterns

Test set #3, English ASCII: text     patterns

 

Note: You may want to try additional text or patterns, in order to support your analysis of the algorithms’ behaviors. If you do so, please include the text and patterns you use with your test runs.

Write Up

Each person in the group must turn in his or her own write-up, which is expected to be about 2 pages plus test output. The write up will have three parts.

  1. English description and pseudo-code for the right-to-left algorithm you implement. If you consult other material in developing your implementation, please give a citation.
  2. An example of how your algorithm constructs the shift functions or tables for the right-to-left algorithm, using one of the patterns in the test data.
  3. An analysis of your test data.

Also turn in the output of each algorithm on all three test sets. (People from the same group can turn in a copy of the same test runs.)

 

You do not need to turn in a listing of your program.

Grading Scheme

There are 80 points possible, divided as follows.

 

A. (20 points) Test runs

B. (20 points) Description of approach and pseudocode for right-to-left algorithm

C. (15 points) Example of shift-function construction

D. (25 points) Analysis of test data