CS301 Winter 1999 - Regression Testing

Regression testing is a common approach to testing compilers. The general idea is to maintain a test suite and the compiler outputs--the `generated code' and the diagnostics--for each program in the suite. In the early development stages, the `generated code' may not be code at all, but some other output, e.g., a list of variables and types or intermediate code. These outputs are called the baseline.

Each time the compiler is changed, it is tested with the programs in the test suite and the results are compared with the baseline. For many changes, e.g., algorithm improvements, the output is identical to the baseline. Unexpected differences suggest errors. For other changes, e.g., generating better code or supporting additional source-language features, the output will differ from the baseline in presumably expected ways. Once the changes are accepted as correct, the latest outputs are established as the new baseline.

A small test suite for PCAT is in /u/cs301acc/pcat. (You are encouraged to mail additional programs to apt for inclusion in the suite.) To test some compiler phases (e.g., parsing or typechecking) it is suitable to include invalid programs as well as valid ones in the test suite.

Details of the testing scheme depend on the compiler stage being tested. When the output of the compiler is some intermediate file (e.g., an ast), its contents can be compared with expected contents. When the `generated code' is actually executable (either directly or via an interpreter), testing should also execute it, and compare the outputs with the expected outputs.

This testing scheme can be automated using a shell script. For example, you can test your interpreter with the help of the script /u/cs301acc/3/dotest, which compiles each test program to an .ast file, and compares the outputs with the baseline. If the baseline files don't exist, dotest creates them from the current outputs. Here is dotest:

\begin{code}\char93 !/bin/sh
cd tst
for i
d=\lq dirname \$i\lq
f=\lq basename \$i ....
...iff \$f.err.bak \$f.err;
else cp \$f.err \$f.err.bak; fi
exit 0\end{code}

The compiler outputs and the baseline go in a private directory ./tst. For the test program /u/cs301acc/pcat/fibb.pcat, the outputs are in tst/fibb.ast and tst/fibb.err, and tst/fibb.ast.bak and tst/fibb.err.bak are the baseline. Later on, when we want to actually run the test programs, the test input for fibb will be in /u/cs301acc/pcat/fibb.in. The other test programs are handled similarly; for those that do not read input, there will be no .in file.

Note that the baseline is not part of the test suite; you must construct it. You may wish to construct the initial baseline by using my executable; appropriate files for the existing test suite are in /u/cs301acc/3/tst.

Once the outputs are deemed acceptable, you can use /u/cs301acc/3/newbaseline to establish a new baseline:

\begin{code}\char93 !/bin/sh
cd tst
for j in *.ast *.err ; do
if [ -r \$j.bak ]...
...\$j \$j.bak; fi
echo tst/\$j 1>\ mv \$j \$j.bak
exit 0\end{code}
Things can be further automated by adding a makefile entry to execute dotest on a specified test suite. The makefile entry is something like this:
\begin{code}test: parser /u/cs301acc/pcat/*.pcat tst/*.pcat makefile
dotest /u/cs301acc/tst/*.pcat tst/*.pcat\end{code}
This entry also permits an optional private test suite containing your own PCAT programs in tst/*.pcat. The command make test builds the parser (parser) and executes dotest on the test suites.

Andrew P. Tolmach