How can software be "probably" correct?
Software theoreticians have always been uncomfortable
applying the probabilistic
idea of reliability to programs. Perhaps this is because
one of the attractive aspects of programming is the precise,
deterministic nature of the beast. There is no doubt what a
program does, and all of it directly results from human decisions
made by the programmer.
For any particular input, a program either works or it doesn't
work, so what is the justification for assigning a probability
to this situation?
A shift in viewpoint is required to understand the basis of
software reliability.
The software minefield
An analogy may be helpful. Picture a large field where landmines
are tested. Over the years, many mines have been laid in the field,
without any apparent plan, and forgotten. Now a person comes upon
the field, knowing only that there may be live mines somewhere, and wishes
to assess the chances of safely walking on the ground.
The only sensible way to do this is to sample the terrain. The person
might throw rocks onto a number of locations. If there are no
explosions, is it likely to be safe? It is possible to quantify
both the degree of safety, and a confidence in its estimate, in
terms of the number of probes made.
(For details, look at the
definitions and equations .)
The more that is known about how the person intends to walk,
the better the probes can be.
The analogy is clear: the minefield is the input space of a program,
and the mines are the inputs that will result in program failures.
The rocks thrown are software tests, and no explosions corresponds
to no test failures.
The "probable incorrectness" of the field
is the probability that a rock thrown at random will set off
an explosion.
The whole thing is perfectly determined in advance,
yet assessing the situation with a statistical model seems the
only possibility.
In the analogy, knowledge about the path to be walked
corresponds to the operational profile of expected program usage.
When the path is known, it is obviously better to throw all of
the rocks along that path. For the program, the "probable
incorrectness" is technically
the failure probability for an input chosen at random according
to the operational profile -- that is, chosen in a way representative
of expected usage.
Of course the apparent statistical properties of minefields
(and by analogy, programs)
depend on how the mines are distributed and how the samples are
taken. There are obviously pathological cases in which the
statistical approach is unwise. For example,
if the mines are actually concentrated in a small region, and
the rocks are thrown in a uniform pattern too sparse to find
this region, yet the path to be walked crosses this region,
all bets are off.