How can software be "probably" correct?

Software theoreticians have always been uncomfortable applying the probabilistic idea of reliability to programs. Perhaps this is because one of the attractive aspects of programming is the precise, deterministic nature of the beast. There is no doubt what a program does, and all of it directly results from human decisions made by the programmer. For any particular input, a program either works or it doesn't work, so what is the justification for assigning a probability to this situation? A shift in viewpoint is required to understand the basis of software reliability.

The software minefield

An analogy may be helpful. Picture a large field where landmines are tested. Over the years, many mines have been laid in the field, without any apparent plan, and forgotten. Now a person comes upon the field, knowing only that there may be live mines somewhere, and wishes to assess the chances of safely walking on the ground. The only sensible way to do this is to sample the terrain. The person might throw rocks onto a number of locations. If there are no explosions, is it likely to be safe? It is possible to quantify both the degree of safety, and a confidence in its estimate, in terms of the number of probes made. (For details, look at the definitions and equations .) The more that is known about how the person intends to walk, the better the probes can be.

The analogy is clear: the minefield is the input space of a program, and the mines are the inputs that will result in program failures. The rocks thrown are software tests, and no explosions corresponds to no test failures. The "probable incorrectness" of the field is the probability that a rock thrown at random will set off an explosion. The whole thing is perfectly determined in advance, yet assessing the situation with a statistical model seems the only possibility. In the analogy, knowledge about the path to be walked corresponds to the operational profile of expected program usage. When the path is known, it is obviously better to throw all of the rocks along that path. For the program, the "probable incorrectness" is technically the failure probability for an input chosen at random according to the operational profile -- that is, chosen in a way representative of expected usage.

Of course the apparent statistical properties of minefields (and by analogy, programs) depend on how the mines are distributed and how the samples are taken. There are obviously pathological cases in which the statistical approach is unwise. For example, if the mines are actually concentrated in a small region, and the rocks are thrown in a uniform pattern too sparse to find this region, yet the path to be walked crosses this region, all bets are off.