compver

Compiler Verification

(Thanks to Pierce, "Software Foundations.")

Common notions

Require Import Arith.
Require Import List.
Import ListNotations.
Ltac inv H := inversion H; subst; clear H.

Optional outcomes

Inductive maybe (T:Type) : Type :=
| OK : T -> maybe T
| Error : maybe T.

Arguments OK {T} _.
Arguments Error {T}.

Identifiers

Inductive id : Type :=
Id : nat -> id.

Definition beq_id x1 x2 :=
  match (x1, x2) with
    (Id n1, Id n2) => beq_nat n1 n2
  end.

A few specific identifiers for use in examples.

Definition X : id := Id 0.
Definition Y : id := Id 1.
Definition Z : id := Id 2.

States

Definition state := id -> nat.

Definition initial_state : state := fun _ => 0.

Definition update (st : state) (x : id) (n : nat) : state :=
fun x' => if beq_id x x' then n else st x'.

A randomly chosen example state: X = 100 Y = 20 Z = 2

Definition st1 := update (update (update initial_state X 100) Y 20) Z 2.

Source language: arithmetic expressions.

Definition aexp_eq : forall (e1 e2: aexp), {e1 = e2} + {~e1 = e2}.
repeat (decide equality). Defined.

Semantics of the source language, given in a "denotational" style.

N.B. If our source language allowed non-terminating computations or was non-deterministic, we'd probably need to give a relational description instead, e.g. in the following big-step style:

Inductive aevalR (st : state) : aexp -> nat -> Prop :=
  | E_ANum : forall (n:nat),
      aevalR st (ANum n) n
  | E_AId : forall (x:id),
      aevalR st (AId x) (st x)
  | E_APlus : forall (a1 a2: aexp) (n1 n2 : nat),
      aevalR st a1 n1 ->
      aevalR st a2 n2 ->
      aevalR st (APlus a1 a2) (n1 + n2)
  | E_AMinus : forall (a1 a2: aexp) (n1 n2 : nat),
      aevalR st a1 n1 ->
      aevalR st a2 n2 ->
      aevalR st (AMinus a1 a2) (n1 - n2)
  | E_AMult : forall (a1 a2: aexp) (n1 n2 : nat),
      aevalR st a1 n1 ->
      aevalR st a2 n2 ->
      aevalR st (AMult a1 a2) (n1 * n2).

It is easy to show that these are equivalent...

Theorem aevalR_aeval : forall st e, aevalR st e (aeval st e).

Proof. induction e; constructor; auto. Qed.

...so for convenience we'll continue to use the functional version.

An example program

Definition ex1 := (AMinus (AId X) (AMult (ANum 2) (APlus (ANum 0) (AId Y)))).

We can evaluate the denotation of a program directly inside Coq.

Eval compute in (aeval st1 ex1).

(* ===> 60:nat *)

The target language: stack machine instructions

HP Calculators, programming languages like Forth and Postscript, and abstract machines like the Java Virtual Machine all evaluate arithmetic expressions using a stack. For instance, the expression

   (2*3)+(3*(4-2))

would be entered as

   2 3 * 3 4 2 - * +

and evaluated like this:

  []            |    2 3 * 3 4 2 - * +
  [2]           |    3 * 3 4 2 - * +
  [3; 2]        |    * 3 4 2 - * +
  [6]           |    3 4 2 - * +
  [3; 6]        |    4 2 - * +
  [4; 3; 6]     |    2 - * +
  [2; 4; 3; 6]  |    - * +
  [2; 3; 6]     |    * +
  [6; 6]        |    +
  [12]          |

We show how to write a small compiler that translates aexps into stack machine instructions.

The instruction set for our stack language will consist of the following instructions:

SPush n: Push the number n on the stack.
SLoad x: Load the identifier x from the store and push it on the stack
SPlus: Pop the two top numbers from the stack, add them, and push the result onto the stack.
SMinus: Similar, but subtract.
SMult: Similar, but multiply.

Semantics of the target machine

Fixpoint s_execute (st : state) (stack : list nat) (prog : sprog)
                 : list nat :=
  match (prog, stack) with
  | (nil, _ ) => stack
  | (SPush n::prog', _ ) => s_execute st (n::stack) prog'
  | (SLoad x::prog', _ ) => s_execute st (st x::stack) prog'
  | (SPlus::prog', n::m::stack') => s_execute st ((m+n)::stack') prog'
  | (SMinus::prog', n::m::stack') => s_execute st ((m-n)::stack') prog'
  | (SMult::prog', n::m::stack') => s_execute st ((m*n)::stack') prog'
  | (_::prog', _ ) => s_execute st stack prog' (* Bad state: skip *)
  end.

(Again, we could alternatively give this semantics as a step relation.)

Compilers and Correctness

Now we are in a position to define what a compiler is and what it means for a compiler to be correct.

A compiler is just a function from source code to target code, allowing for the possibility of failure.

Definition compiler := aexp -> maybe sprog.

If the compiler succeeds, then the observed behavior of target should match that of source.

In this case, all we can "observe" is the overall value of the expression and the final result left on the machine stack.

Definition compiler_ok (comp : compiler) :=
  forall (e:aexp) (st:state) (p: sprog),
    comp e = OK p ->
    s_execute st [] p = [ aeval st e ].

A simple initial compiler

Fixpoint s_compile_r (e : aexp) : sprog :=
  match e with
  | ANum n => [SPush n]
  | AId x => [SLoad x]
  | APlus a1 a2 => s_compile_r a1 ++ s_compile_r a2 ++ [SPlus]
  | AMinus a1 a2 => s_compile_r a1 ++ s_compile_r a2 ++ [SMinus]
  | AMult a1 a2 => s_compile_r a1 ++ s_compile_r a2 ++ [SMult]
  end.

Definition s_compile (e : aexp) : maybe sprog :=
  OK (s_compile_r e).

Again, we can evaluate the result of compilation directly inside Coq.

Definition p1 := s_compile ex1.
Eval compute in p1.

(* ===> OK [SLoad X; SPush 2; SPush 0; SLoad Y; SPlus; SMult; SMinus] *)

Coq also supports "extraction" of functions and datatypes to a more efficiently executable language such as OCaml.

Recursive Extraction s_compile.

(see compver_extract.ml)

So properties we prove about s_compile apply directly to the "real" compiler generated by extraction (assuming we trust the extraction process, of course).

Proof of Correctness.

We can give a simple direct proof of compiler correctness. The proof is by induction on the form of aexp being compiled. The only trick is that we must generalize the induction hypothesis to consider arbitrary stacks and code sequences.

Lemma execute_eval_comm : forall st e rest stack,
s_execute st stack (s_compile_r e ++ rest) =
s_execute st (aeval st e :: stack) rest.

Proof.
  induction e;
    try reflexivity;
    intros; simpl; rewrite app_ass; rewrite IHe1;
      rewrite app_ass; rewrite IHe2; reflexivity.
Qed.

Theorem s_compile_ok: compiler_ok s_compile.

Proof.
  unfold compiler_ok. intros.
   unfold s_compile in H. inv H.
   replace (s_compile_r e) with (s_compile_r e ++ []) by auto with datatypes.
   rewrite execute_eval_comm. auto. Qed.

A self-checking compiler.

This compiler performs translation validation on its output to check if it is correct. Checking is conservative: it might say "no" even if the output is correct.

This particular checker will work by decompiling the output to an aexp and comparing that to the original input.

The decompiler, in turn, works by symbolic execution of the program.

Fixpoint sym_execute (stack : list aexp) (prog : sprog) : list aexp :=
  match (prog, stack) with
  | (nil, _ ) => stack
  | (SPush n::prog', _ ) => sym_execute (ANum n::stack) prog'
  | (SLoad x::prog', _ ) => sym_execute (AId x::stack) prog'
  | (SPlus::prog', n::m::stack') => sym_execute ((APlus m n)::stack') prog'
  | (SMinus::prog', n::m::stack') => sym_execute ((AMinus m n)::stack') prog'
  | (SMult::prog', n::m::stack') => sym_execute ((AMult m n)::stack') prog'
  | (_::prog', _ ) => sym_execute stack prog' (* Bad state: skip *)
  end.

Definition decompile (prog : sprog) : maybe aexp :=
  match sym_execute [] prog with
  | aexp::nil => OK aexp
  | _ => Error
  end.

Eval compute in decompile [SLoad X; SPush 2; SPush 0; SLoad Y; SPlus; SMult; SMinus].

(* ===> OK (AMinus (AId X) (AMult (ANum 2) (APlus (ANum 0) (AId Y)))) *)

Definition s_compile_check (e : aexp) : maybe sprog :=
  match s_compile e with
  | OK p =>
       match decompile p with
       | OK e' => if aexp_eq e e' (* a very simplistic check !!! *)
                    then OK p
                    else Error
       | Error => Error
       end
  | Error => Error
  end.

Eval compute in s_compile_check ex1.

(* ===> OK [SLoad X; SPush 2; SPush 0; SLoad Y; SPlus; SMult; SMinus] *)

We would like to verify that s_compile_check is ok, in other words that our checker is sound.

First we show that symbolic execution agrees with real execution.

Inductive stacks_agree (st:state) : list aexp -> list nat -> Prop :=
| sa_nil : stacks_agree st nil nil
| sa_cons : forall e n es ns, aeval st e = n ->
                              stacks_agree st es ns ->
                              stacks_agree st (e::es) (n::ns).
Hint Constructors stacks_agree.

Lemma sym_execute_correct: forall st p ss s,
  stacks_agree st ss s ->
  stacks_agree st (sym_execute ss p) (s_execute st s p).
Proof.
  induction p. auto.
  simpl; intros; destruct a; eauto; inv H; eauto; inv H1; eauto. Qed.

Lemma decompile_correct : forall st p e,
decompile p = OK e -> s_execute st [] p = [aeval st e].

Proof.
  unfold decompile. intros.
  destruct (sym_execute [] p) eqn:E.
   inv H.
   destruct l; inv H.
   assert (stacks_agree st [e] (s_execute st [] p)).
     rewrite <- E. apply sym_execute_correct. econstructor.
   inv H. inv H4. auto. Qed.

Now we can show that our checking compiler is correct.

Theorem s_compile_check_ok: compiler_ok (s_compile_check).

Proof.
  unfold compiler_ok, s_compile_check. intros.
  destruct (s_compile e); inv H.
  destruct (decompile s) eqn:E; inv H1.
  destruct (aexp_eq e a); subst; inv H0.
  apply decompile_correct; auto. Qed.

Adding Optimization

So far our two proof approaches get equivalent results, except that the checking compiler might fail-stop.

The proof effort is also roughly the same, although the translation validation approach is a bit more complicated because we have to define the checking machinery.

In other cases, however, proving correctness of a checker might be substantially easier than a direct proof.

Now let's introduce a simple optimization into the compiler. We'll write this as an initial source-to-source transformation on input terms. This transformation simply replaces any expression of the form 0+e with e.

Fixpoint optimize_0plus (e:aexp) : aexp :=
  match e with
  | ANum n =>
      ANum n
  | AId x =>
      AId x
  | APlus (ANum 0) e2 =>
      optimize_0plus e2
  | APlus e1 e2 =>
      APlus (optimize_0plus e1) (optimize_0plus e2)
  | AMinus e1 e2 =>
      AMinus (optimize_0plus e1) (optimize_0plus e2)
  | AMult e1 e2 =>
      AMult (optimize_0plus e1) (optimize_0plus e2)
  end.

Now here is a compiler incorporating this optimization. Think of this as a very simple multi-stage compiler: source -> optimized source -> target

Definition opt_compile (e : aexp) : maybe (list sinstr) :=
s_compile (optimize_0plus e).

To do a direct proof of correctness for the optimizing compiler, we can examine the behavior of the optimization and prove that it preserves behavior.

Theorem optimize_0plus_sound: forall st e,
aeval st (optimize_0plus e) = aeval st e.

Proof.
  intros st e. induction e; simpl; auto.
    destruct e1; simpl; auto.
      destruct n; simpl; auto. Qed.

Based on this knowledge, we can reprove correctness of the whole compiler chain.

Theorem opt_compiler_ok : compiler_ok opt_compile.

Proof.
unfold compiler_ok, opt_compile. intros.
rewrite <- optimize_0plus_sound. apply s_compile_ok. auto. Qed.

We can build a checking compiler just as before.

Definition opt_compile_check (e : aexp) : maybe (list sinstr) :=
  match opt_compile e with
  | OK p =>
     match decompile p with
     | OK e' => if aexp_eq e e' then OK p else Error
     | Error => Error
     end
  | Error => Error
  end.

And the same proof shows that this compiler is correct by our definition.

Theorem opt_compile_check_ok: compiler_ok opt_compile_check.

Proof.
  unfold compiler_ok, opt_compile_check. intros.
  destruct (opt_compile e); inv H.
  destruct (decompile l) eqn:E; inv H1.
  destruct (aexp_eq e a); subst; inv H0.
  apply decompile_correct; auto. Qed.

But this compiler fails if the optimizer kicks in, because the decompiled code is not exactly identical to what we started with.

Eval compute in (s_compile ex1).

(*  ===> OK [SLoad (Id 0); SPush 2; SPush 0; SLoad (Id 1); SPlus; SMult; SMinus] *)

Eval compute in (opt_compile ex1).

(*  ===> OK [SLoad (Id 0); SPush 2; SLoad (Id 1); SMult; SMinus] *)

Eval compute in (s_compile_check ex1).

(*  ===> OK [SLoad (Id 0); SPush 2; SPush 0; SLoad (Id 1); SPlus; SMult; SMinus] *)

Eval compute in (opt_compile_check ex1). (* oops! *)

(*  ===> Error *)

The problem is that we have been very simplistic in writing our checking function. It is obviously not reasonable to expect that the result of decompilation will be syntactically identical to the program we started with.

There are two general ways to address this problem.

(1) If we assume that we can see (at least part of) the internals of the compiler, we can use knowledge about how the compiler works to predict how the decompiled output will differ from the original, and adjust the comparison accordingly.

In the current simple situation, it suffices to run the optimize_0plus function over the original program before doing the comparison.

(2) If we assume that the compiler is a black box, we can still attempt to normalize the original program and the decompiled one before comparing.

In the current situation, this might mean normalizing arithmetic expressions into sorted polynomials. Obviously, this is a lot more work, but it has the advantage of being insensitive to the precise behavior of the compiler.

Note that this normalization approach can be applied to program representations other than the source language. For example, source and target might both be translated into a third language before being compared.