Available Expressions
An expression is *available* at quad n if
(a) on every path from entry node to n, the expression is computed at least once; and
(b) on each such path, there is no variable used in the expression is redefined after
the last computation of the expression
Application: If an expression is available at a node where it is being recomputed,
it is possible to replace the recomputation by
a variable representing the result of the previous computation.
Example:
1 g <- x + y
2 i <- x - y
3 L: r <- x + y
4 s <- x - y
5 x <- x + 1
6 h <- x + y
7 if ... goto L
Here x+y is available at node 3, because it has already been computed, either in
node 1 (as g) or in node 6 (as h), and because the only redefinition of x or y
is in node 5, which precedes node 6. On the other hand, x-y is not available
at node 4, because if control comes to node 4 from the bottom of the loop,
it has not been recomputed since x changed.
To avoid recomputation, we introduce a new variable, say p, and modify the program
as follows:
p <- x + y
g <- p
i <- x - y
L: r <- p
s <- x - y
x <- x + 1
p <- x + y
h <- p
if ... goto L
In many cases, the copy quads can be removed by later optimization passes.
Computing Available Expressions
Again, set up as a dataflow problem, computing sets of available expressions at
entry and exit of each node.
gen[s] = expression computed by s
kill[s] = expressions whose values are changed by s
s gen[s] kill[s]
t <- b bop c {b bop c} - kill[s] expressions containing t
t <- M[b] {M[b}} - kill[s] expressions containing t
t <- b {} expressions containing t
M[a] <- b {} expressions of form M[q] for *any* q
t <- f(a1,..an) {} expressions containing t or of form M[q] for *any* q
Note:
The severe kill clauses for memory accesses are due to the possibility of *aliasing*.
We might think that M[a] <- b should only kill M[a]. But suppose we have two
memory pointer variables a and b, with a = b. Then after the sequence
t <- M[a]
M[b] <- u
M[a] is no longer available!
The clauses above are appropriate if we know *nothing* about the aliasing behavior
of the program, i.e., it treats the entire memory as a single variable. If we know
more, via an *alias analysis*, we can kill fewer expressions.
Here are the dataflow expressions for available expressions at the beginning (in)
and end (out) of each node. Let
pred[n] = set of predecessors of n in CFG.
in[n] = intersection over all p in pred[n] of out[p]
out[n] = gen[n] U (in[n] - kill[n])
To solve these equations we must start with the optimistic assumption that
all the in and out sets are *full* (contain all possible expressions), and
then iterate to reach a *greatest fixed point*, i.e., the largest sets that
solve the equations. (If we started with empty sets, we'd never get anywhere,
due to the intersection operation in in[].)
Here's the example. Let A = the set of all potentially interesting expressions, namely
{x+y,x-y,x+1}.
iteration 0 iteration 1 iteration 2 iteration 3
n pred[n] gen[n] kill[n] in[n] out[n] in[n] out[n] in[n] out[n] in[n] out[n]
1 - x+y - - A - x+y - x+y - x+y
2 1 x-y - A A x+y x+y,x-y x+y x+y,x-y x+y x+y,x-y
3 2,7 x+y - A A x+y,x-y x+y,x-y x+y x+y x+y x+y
4 3 x-y - A A x+y,x-y x+y,x-y x+y x+y,x-y x+y x+y,x-y
5 4 - A A A x+y,x-y - x+y,x-y - x+y,x-y -
6 5 x+y - A A - x+y - x+y - x+y
7 6 - - A A x+y x+y x+y x+y x+y x+y
Liveness Analysis
A variable is *live* at a quad if its current value will be used later in the computation.
Applications for liveness include register allocation (only live variables need
registers), and for dead code removal (if a variable is not live immediately after
an assignment is made to it, then the assignment is useless and can be removed).
Define liveness using dataflow analysis
gen[n] = set of variables used by node n
kill[n] = set of variables defined by node n
s gen[s] kill[s]
t <- b bop c {b,c} {t}
t <- M[b] {b} {t}
M[a] <- b {a,b} {}
if a relop b then L {a,b} {}
t <- f(a1,...an) {a1,...an} {t}
Note that flow equations for this problem are *backwards*, i.e.,
data flows in reverse direction from control flow.
Let succ[n] = set of successors of node n in CFG.
in[n] = gen[n] U (out[n] - kill[n])
out[n] = union over all s in succ[n] of in[s]
Example:
1 a <- 0
2 L: b <- a + 1
3 c <- c + b
4 a <- b * 2
5 if a < N goto L
6 f(c)
Assume that a,b,c are local variables not used after the termination of this code fragment.
Here's a solution. The control flow equations are solved as usual, but it is more efficient (takes
fewer iterations) to fill them in in (roughly) reverse execution order, computing out[] before in[].
iteration 0 iteration 1 iteration 2 iteration 3
n succ[n] gen[n] kill[n] out[n] in[n] out[n] in[n] out[n] in[n] out[n] in[n]
6 - c - - - - c - c - c
5 2,6 a - - - c a,c a,c a,c a,c a,c
4 5 b a - - a,c b,c a,c b,c a,c b,c
3 4 b,c c - - b,c b,c b,c b,c b,c b,c
2 3 a b - - b,c a,c b,c a,c b,c a,c
1 2 - a - - a,c c a,c c a,c c
Note that this in example, no more than two of {a,b,c} are ever simultaneously live,
so two registers will suffice to hold these variables at all times.