Component Synthesis System Documentation

2 April 2009

INTRODUCTION

This general documentation describes a complex of software tools that
support experimentation with "software components," executable units
meant to be combined into systems but developed independent of any
system.  The purpose of the tools is to explore fundamental questions
about the idea of independent development and composition from the
testing viewpoint.  They were motivated by theoretical ideas of Hamlet,
Mason, and Woit (first published at ICSE 2001) about the possibility of
calculating system properties from measured properties of components --
they implement algorithms for component measurement and for synthesis
of those measurements into system properties, both functional and
non-functional.  The theory and tools are based on testing over a
collection of input- and persistent-state subdomains.  The work was
originally supported by NSF ITR grant CCR-0112654, and later by an E.T.S.
Walton fellowship from Science Foundation Ireland.

The tools are all Perl scripts, which have been most thoroughly tested
on Linux/GNU systems.  They should run without much difficulty on any
UNIX system.  The scripts run on some MS windows systems and on Apple
Mac OS X, but have been infrequently used on those systems, so may be buggy.
Other than Perl itself, the only software required is the GNUPlot
graphing program, which is available for a wide variety of systems.
Each script includes header comments that are something like a
Unix man-page description.  This document is intended as a higher-level
introductory narrative for users rather than for implementers.  The tools
were implemented in four versions, each extending the previous and each
with separate documentation:  for stateless components; for components
with state; for incremental processing; and for concurrent components.
The concurrent version was also revised to permit execution on non-UNIX
systems, by replacing GNU utility actions with internal Perl routines.
For the most part, the concurrent documentation should be comprehensive.

ASSUMPTIONS AND RESTRICTIONS

As befits foundational work, the components and systems involved are
reduced to the simplest cases, both to aid in understanding and to permit
full implementations.  The components are arbitrary executable programs,
but they must observe stringent input-output restrictions:  they must
have but one floating-point input and one floating-point output value
and a single non-functional property (run-time or reliability) with
float value, and if they use persistent state, it must also be a single
float value.  Concurrent components must make only a single exchange of
output/input (and other restrictions) that guarantee synchronization and
absence of deadlock.  These restrictions are implemented by requiring that
component executables take input from STDIN, write output to STDOUT, write
non-functional property values to STDERR, and read and write persistent
state values from/to a file named "<code>.state", where <code> is the base
name of the executable file.  The systems built using these components are
also restricted:  they must be formed by combining components in sequence,
in conditionals, or in loops, the three "structured" constructors; the
constructions may be arbitrarily nested.  Concurrent execution employs
a "master" component that may start another in parallel.  "Master"
components may not have state and must observe nesting restrictions.
A system is described in reverse Polish notation using an operator for
each construct and names for the components as operands.

EXAMPLE COMPONENT/SYSTEM

For example, a very simple system might be constructed from a
single stateless component with the Perl code:

  #! /usr/bin/perl 
  $x = <STDIN>;
  $y = 10.0*abs(sin($x)); 
  print "$y\n"; 
  $r = 2 + $y/5.0; 
  print STDERR "$r\n";

The non-functional property values sent to STDERR will henceforth be 
described as "run time," although they are arbitrary values.

A system where this component is composed with itself twice in series
is described by the Polish:

  1 1 S 1 S

where "S" is the sequence-composition operator and "1" represents the
component.  

To construct an experiment using the tools, the above suffices, except
for one crucial idea:  Explicit subdomains are used to describe and
test components.  For the above, it might be reasonable to use (say)
11 interval subdomains: [0,1), [1,2), ..., [10,11).  To test over these
subdomains, 5 points in each might be adequate.

To reiterate, a person uses the system by supplying:

(1) Executable component code(s);

(2) Appropriate subdomains over which to sample component executions; and

(3) A system structure in Polish.

The tools can then do two general operations, which we describe for the
example above.  The reader might want to follow along by first reading
the description to follow on how to use the tools, and letting them do
the work.

First, the tools execute each component, sampling it uniformly in
each subdomain and recording its average output and run time there.
In the example this yields a pair of tables, each with 11 entries,
relating subdomains to outputs and to run times, respectively.
For example, the fourth test (out of seven) in subdomain [1,2) would
have input 1.5, output 9.97, and run time 3.99. All values sampled
in interval [1,2) average to 9.39 and 3.88 respectively, so the two table
entries would be ([1,2), 9.39) and ([1,2), 3.88).  These tables constitute
a testing approximation to the behaviour of the executable code, an
approximation that is subsequently used to predict system properties.

Second, the given system is synthesized from the components and its
behaviour is calculated.  This means constructing two more 11-entry
tables, but now for the system.  One entry in these system tables would be
found by starting with subdomain [1,2) and calculating the system values.
The average component output for [1,2) is 9.39, which falls in subdomain
[9,10) of the second component, and the corresponding average output
there is 2.84, which falls in subdomain [2,3) of the third component,
with average output 2.50.  Thus the system output on input 1.5 (or any
other input in the [1,2) subdomain) is predicted to be 2.50.  The run
time is similarly calculated using the other measured table, but in
conjunction with the functional input-output table since to find the run
time of the second component in place in the system for a system input in
[1,2) requires the information that the second component is invoked in
[9,10), and so on.

The tools do an error analysis as well.  At component level, they
report the deviation of the subdomain approximation from the actual
component behaviour.  At the system level, although worst-case errors
could be predicted, the tools instead create the described system
using the original component code and execute it to see how far off the
predictions actually are.  In the example, the component r-m-s error on
[1,2) is about 7.4%, while the r-m-s system error there is about 50%.
The latter involves the errors on [9,10) and on [2,3), which are 22.7%
and 35.6% respectively.  Evidently, to get good system predictions will
require more subdomains for a more accurate component approximation.

USING THE TOOLS

The user must supply the three pieces of information to describe
components and system.  This is done by creating a directory into which
the tool scripts are copied, and placing description files there for
them to work on.  Again for the example, suppose that a directory "Exa"
is created and the tools placed there.  The executable code files for
each component go into Exa, in the example say under the arbitrary name
"Ecode", which is made executable (with `chmod +x' on UNIX/Mac systems, and
with some difficulty on MS windows).  The subdomain structure is described
in a configuration file whose extension must be ".ccf", say "Esubd.ccf".
The .ccf file format is a list of the subdomains in order as pairs, with
the sample count following.  The first line lists the executable file.
Thus "Esubd.ccf" is:

  Ecode
  0 1 5 
  1 2 5 
  2 3 5 
  ...  
  9 10 5
  10 11 5

The system description is placed in a file that must be named
"system.pscf" which contains the reverse Polish description and a list
of the components, here:

  1 1 S 1 S 
  Esubd.ccf

In the Polish first line, the component numbers are indicies into
subsequent lines of the file.

Now everything is ready in directory Exa, which is made the current
directory.  The component measurement is done by the command

  COMP -V -G

in which the options specify that informational messages will be printed
as various scripts execute (-V) and that the measured results are to be
displayed graphically (-G).  The tabular approximation files have the
same name as the subdomain descriptions, but with "t" added, so here
"Esubd.ccft" is created by the COMP script.

To make system predictions, the command is

  SYN -V -G

(same options as above).

Typically, a system synthesis reveals deficiencies in the components
or in their subdomains, which requires changes to the .ccf file(s)
and repeating these commands.  In the example, if the subdomains are
subdivided until there are 352 instead of 11, the component approximation
error is reduced to below 1.5% in all subdomains and the system error 
is 3.45% in [1.50,1.53).

TO LEARN MORE

The example above has barely scratched the surface of the tools' 
capabilities.  To list a few:

* The non-functional property can be reliability in place of run time.

* Components may have private persistent state.

* Components may execute concurrently.

* For stateless components, the measured approximation may be a best-fit
line on each subdomain in place of an average.

* A trace facility can display the usage profiles that
each component sees when in place in the system.

* There are auxiliary scripts that perform helpful tasks such as
splitting subdomains to obtain a better approximation; graphically
displaying properties of components, subsystems, and systems; and
analyzing the errors in component measurement and system predictions.

* Loop synthesis in the underlying theory, unlike most techniques
for dealing with iteration, is decidable in principle, so the tools
can predict the outcome of arbitrary loops.  (The prediction is based
on measured subdomain approximations, so it may be incorrect.)

* An "incremental" version of the tools keeps track of a previous
base system and recalculates only to the extent that particular
subdomains in components have changed.
