The goal of this project is to try out two different string-matching approaches on several kinds of data.
This assignment can be done in pairs, in any programming language you wish, but each person must turn in his or her own independent write-up, described below. The due date for this assignment is Thursday, 26 May.
You should implement two different algorithms. You can use whatever programming language you wish.
Knuth-Morris-Pratt: One of the algorithms to implement is the Knuth-Morris-Pratt algorithm discussed in class.
Right-to-Left: You should implement a second algorithm that uses right-to-left matching of the pattern to the text string, along with a shift function for advancing the pattern under mismatches. This technique is used in the Boyer-Moore matching algorithm, which is not discussed in the textbook but is available in other sources.
Both algorithms should keep a count of comparisons made. This count should only include pattern-to-text comparisons, and not comparisons made in pre-processing the pattern.
Each algorithm should output the all the positions at which the pattern matches the text, or “no match” if there are none. Each should also list the number of comparisons made.
Each test set consists of a 1000-character text string, and a file of 5 patterns, one per line. You should run each algorithm on each text string with all five corresponding patterns.
Test set #1, binary string: text patterns
Test set #2, random ASCII: text patterns
Test set #3, English ASCII: text patterns
Note: You may want to try additional text or patterns, in order to support your analysis of the algorithms’ behaviors. If you do so, please include the text and patterns you use with your test runs.
Each person in the group must turn in his or her own write-up, which is expected to be about 2 pages plus test output. The write up will have three parts.
Also turn in the output of each algorithm on all three test sets. (People from the same group can turn in a copy of the same test runs.)
You do not need to turn in a listing of your program.
There are 80 points possible, divided as follows.
A. (20 points) Test runs
B. (20 points) Description of approach and pseudocode for right-to-left algorithm
C. (15 points) Example of shift-function construction
D. (25 points) Analysis of test data