Component Synthesis System Documentation

2 March 2007

INTRODUCTION

This general documentation describes a complex of software tools that
support experimentation with "software components," executable units
meant to be combined into systems but developed independent of any
system.  The purpose of the tools is to explore fundamental questions
about the idea of independent development and composition from the
testing viewpoint.  They were motivated by theoretical ideas of Hamlet,
Mason, and Woit (first published at ICSE 2001) about the possibility of
calculating system properties from measured properties of components --
they implement algorithms for component measurement and for synthesis
of those measurements into system properties, both functional and
non-functional.  The theory and tools are based on testing over a
collection of input- and persistent-state subdomains.  The work was
originally supported by NSF ITR grant CCR-0112654, and later by an E.T.S.
Walton grant from Science Foundation Ireland.

The tools are all Perl scripts, which have been tried only on Linux/GNU
systems.  Detailed documentation exists for two groups of tools, one
for stateless components and one for components with persistent state.
Each script also includes header comments that are something like a Unix
man-page description.  This document is intended as a higher-level
introductory narrative for users rather than for implementers.

ASSUMPTIONS AND RESTRICTIONS

As befits foundational work, the components and systems involved are
reduced to the simplest cases, both to aid in understanding and to permit
full implementations.  The components are arbitrary executable programs,
but they must observe stringent input-output restrictions:  they must have
but one floating-point input and one floating-point output value and a
single non-functional property (run-time or reliability) with float value,
and if they use persistent state, it must also be a single float value.
These restrictions are implemented by requiring that component executables
take input from STDIN, write output to STDOUT, write non-functional
property values to STDERR, and read and write persistent state values
from/to a file named "<code>.state", where <code> is the base name of the
executable file.  The systems built using these components are also
restricted:  they must be formed by combining components in sequence,
in conditionals, or in loops, the three "structured" constructors;
the constructions may be arbitrarily nested.  A system is described in
reverse Polish notation using an operator for each construct and names
for the components as operands.

EXAMPLE COMPONENT/SYSTEM

For example, a very simple system might be constructed from a
single stateless component with the Perl code:

  #! /usr/bin/perl 
  $x = <STDIN>;
  $y = 10.0*abs(sin($x)); 
  print "$y\n"; 
  $r = 2 + $y/5.0; 
  print STDERR "$r\n";

The non-functional property values sent to STDERR will henceforth be described 
as "run time," although they are arbitrary values.

A system where this component is composed with itself twice in series
is described by the Polish:

  1 1 S 1 S

where "S" is the sequence-composition operator and "1" represents the
component.

To construct an experiment using the tools, the above suffices, except
for one crucial idea:  Explicit subdomains are used to describe and
test components.  For the above, it might be reasonable to use (say)
11 interval subdomains: [0,1), [1,2), ..., [10,11).  To test over these
subdomains, 7 points in each might be adequate.

To reiterate, a person uses the system by supplying:

(1) Executable component code(s);

(2) Appropriate subdomains over which to sample component executions; and

(3) A system structure in Polish.

The tools can then do two general operations, which we describe for the
example above.  The reader might want to follow along by first reading
the description to follow on how to use the tools, and letting them do
the work.

First, the tools execute each component, sampling it uniformly in
each subdomain and recording its average output and run time there.
In the example this yields a pair of tables, each with 11 entries,
relating subdomains to outputs and to run times, respectively.
For example, the fourth test (out of seven) in subdomain [1,2) would
have input 1.5, output 9.97, and run time 3.99. All values sampled
in interval [1,2) average to 9.39 and 3.88 respectively, so the two table
entries would be ([1,2), 9.39) and ([1,2), 3.88).  These tables constitute
a testing approximation to the behaviour of the executable code, an
approximation that is subsequently used to predict system properties.

Second, the given system is synthesized from the components and its
behaviour is calculated.  This means constructing two more 11-entry tables,
but now for the system.  One entry in these system tables would be found
by starting with subdomain [1,2) and calculating the system values.  The
average component output for [1,2) is 9.39, which falls in subdomain
[9,10) of the second component, and the corresponding average output
there is 2.84, which falls in subdomain [3,4) of the third component,
with average output 2.50.  Thus the system output on input 1.5 (or any other
input in the [1,2) subdomain) is predicted to be 2.50.  The run time is similarly
calculated using the other measured table, but in conjunction with the
functional input-output table since to find the run time of the second
component in place in the system for a system input in [1,2) requires
the information that the second component is invoked in [9,10), and so on.

The tools do an error analysis as well.  At component level, they
report the deviation of the subdomain approximation from the actual
component behaviour.  At the system level, although worst-case errors
could be predicted, the tools instead create the described system
using the original component code and execute it to see how far off the
predictions actually are.  In the example, the component r-m-s error on
[1,2) is about 7.4%, while the r-m-s system error there is about 50%.
The latter involves the errors on [9,10) and on [3,4), which are
22.7% and 35.6% respectively.  Evidently, to get good system predictions will
require more subdomains for a more accurate component approximation.

USING THE TOOLS

The user must supply the three pieces of information to describe
components and system.  This is done by creating a directory into which
the tool scripts are copied, and placing description files there for
them to work on.  Again for the example, suppose that a directory "Exa"
is created and the tools placed there.  The executable code files for
each component go into Exa, in the example say under the arbitrary name
"Ecode.bin", which is made executable (with chmod).  The subdomain
structure is described in a configuration file whose extension must be
".ccf", say "Esubd.ccf".  The .ccf file format is a list of the subdomains
in order as pairs, with the sample count following.  The first line
lists the executable file.  Thus "Esubd.ccf" is:

  Ecode.bin 
  0 1 7 
  1 2 7 
  2 3 7 
  ...  
  9 10 7
  10 11 7

The system description is placed in a file that must be named
"system.pscf" which contains the reverse Polish description and a list
of the components, here:

  1 1 S 1 S 
  Esubd.ccf

In the Polish first line, the component numbers are indicies into
subsequent lines of the file.

Now everything is ready in directory Exa, which is made the current
directory.  The component measurement is done by the command

  COMP -V -G

in which the options specify that informational messages will be printed
as various scripts execute (-V) and that the measured results are to be
displayed graphically (-G).  The tabular approximation files have the
same name as the subdomain descriptions, but with "t" added, so here
"Esubd.ccft" is created by the COMP script.

To make system predictions, the command is

  SYNF -V -G

(same options as above).

Typically, a system synthesis reveals deficiencies in the components
or in their subdomains, which requires changes to the .ccf file(s)
and repeating these commands.  In the example, if the subdomains are
subdivided until there are 352 instead of 11, the component approximation
error is reduced to below 1.5% in all subdomains and the system error 
is 3.45% in [1.50,1.53).

TO LEARN MORE

The example above has barely scratched the surface of the tools' 
capabilities.  To list a few:

* The non-functional property can be reliability in place of run time.

* The components may have private persistent state.

* For stateless components, the measured approximation may be a best-fit
line on each subdomain in place of an average.

* There is a trace facility that can display the usage profiles that
each component sees when in place in the system, and can show the
details of any execution from the starting input through each
component used to the system results.

* There are auxiliary scripts that perform helpful tasks such as
splitting subdomains to obtain a better approximation; graphically
displaying properties of components, subsystems, and systems; and
analyzing the errors in component measurement and system predictions.

* Loop synthesis in the underlying theory, unlike most techniques
for dealing with iteration, is decidable in principle, so the tools
can predict the outcome of arbitrary loops.  (The prediction is based
on measured subdomain approximations, so it may be incorrect.)

Detailed documentation on file formats and on each of the individual
tools is available in the tar packages, one for stateless components
and another for components with state.

CURRENTLY UNDER DEVELOPMENT

Experiments with stateless components use a mature theory of subdomain
approximation and calculation of system properties without executing
the system.  For components with persistent state, the same kind of
results can be obtained even though the property-calculation algorithms
are more complicated and the tools to implement them are not complete.
Instead of calculation, a system can be constructed from the component
approximations (using table-lookup in each measured component).  When such
an approximate system is executed, it reproduces what the theoretical
calculated system properties would be, albeit far less efficiently.

Current status:

* Measurement tools for sampling component execution and recording a
subdomain approximation are complete.  The sampling may be systematic
by (input X state) subdomain or may use random sequences of inputs.

* File formats for recording calculated behaviour of a system are
designed.  The format must allow a system to have a state that is
the cross product of states of its components, and hence of ever-growing
size as more components are employed.

* A few cases of property calculation have been implemented and tested,
including the crucial one of components with state in arbitrary series
connections.
