Available Expressions An expression is *available* at quad n if (a) on every path from entry node to n, the expression is computed at least once; and (b) on each such path, there is no variable used in the expression is redefined after the last computation of the expression Application: If an expression is available at a node where it is being recomputed, it is possible to replace the recomputation by a variable representing the result of the previous computation. Example: 1 g <- x + y 2 i <- x - y 3 L: r <- x + y 4 s <- x - y 5 x <- x + 1 6 h <- x + y 7 if ... goto L Here x+y is available at node 3, because it has already been computed, either in node 1 (as g) or in node 6 (as h), and because the only redefinition of x or y is in node 5, which precedes node 6. On the other hand, x-y is not available at node 4, because if control comes to node 4 from the bottom of the loop, it has not been recomputed since x changed. To avoid recomputation, we introduce a new variable, say p, and modify the program as follows: p <- x + y g <- p i <- x - y L: r <- p s <- x - y x <- x + 1 p <- x + y h <- p if ... goto L In many cases, the copy quads can be removed by later optimization passes. Computing Available Expressions Again, set up as a dataflow problem, computing sets of available expressions at entry and exit of each node. gen[s] = expression computed by s kill[s] = expressions whose values are changed by s s gen[s] kill[s] t <- b bop c {b bop c} - kill[s] expressions containing t t <- M[b] {M[b}} - kill[s] expressions containing t t <- b {} expressions containing t M[a] <- b {} expressions of form M[q] for *any* q t <- f(a1,..an) {} expressions containing t or of form M[q] for *any* q Note: The severe kill clauses for memory accesses are due to the possibility of *aliasing*. We might think that M[a] <- b should only kill M[a]. But suppose we have two memory pointer variables a and b, with a = b. Then after the sequence t <- M[a] M[b] <- u M[a] is no longer available! The clauses above are appropriate if we know *nothing* about the aliasing behavior of the program, i.e., it treats the entire memory as a single variable. If we know more, via an *alias analysis*, we can kill fewer expressions. Here are the dataflow expressions for available expressions at the beginning (in) and end (out) of each node. Let pred[n] = set of predecessors of n in CFG. in[n] = intersection over all p in pred[n] of out[p] out[n] = gen[n] U (in[n] - kill[n]) To solve these equations we must start with the optimistic assumption that all the in and out sets are *full* (contain all possible expressions), and then iterate to reach a *greatest fixed point*, i.e., the largest sets that solve the equations. (If we started with empty sets, we'd never get anywhere, due to the intersection operation in in[].) Here's the example. Let A = the set of all potentially interesting expressions, namely {x+y,x-y,x+1}. iteration 0 iteration 1 iteration 2 iteration 3 n pred[n] gen[n] kill[n] in[n] out[n] in[n] out[n] in[n] out[n] in[n] out[n] 1 - x+y - - A - x+y - x+y - x+y 2 1 x-y - A A x+y x+y,x-y x+y x+y,x-y x+y x+y,x-y 3 2,7 x+y - A A x+y,x-y x+y,x-y x+y x+y x+y x+y 4 3 x-y - A A x+y,x-y x+y,x-y x+y x+y,x-y x+y x+y,x-y 5 4 - A A A x+y,x-y - x+y,x-y - x+y,x-y - 6 5 x+y - A A - x+y - x+y - x+y 7 6 - - A A x+y x+y x+y x+y x+y x+y Liveness Analysis A variable is *live* at a quad if its current value will be used later in the computation. Applications for liveness include register allocation (only live variables need registers), and for dead code removal (if a variable is not live immediately after an assignment is made to it, then the assignment is useless and can be removed). Define liveness using dataflow analysis gen[n] = set of variables used by node n kill[n] = set of variables defined by node n s gen[s] kill[s] t <- b bop c {b,c} {t} t <- M[b] {b} {t} M[a] <- b {a,b} {} if a relop b then L {a,b} {} t <- f(a1,...an) {a1,...an} {t} Note that flow equations for this problem are *backwards*, i.e., data flows in reverse direction from control flow. Let succ[n] = set of successors of node n in CFG. in[n] = gen[n] U (out[n] - kill[n]) out[n] = union over all s in succ[n] of in[s] Example: 1 a <- 0 2 L: b <- a + 1 3 c <- c + b 4 a <- b * 2 5 if a < N goto L 6 f(c) Assume that a,b,c are local variables not used after the termination of this code fragment. Here's a solution. The control flow equations are solved as usual, but it is more efficient (takes fewer iterations) to fill them in in (roughly) reverse execution order, computing out[] before in[]. iteration 0 iteration 1 iteration 2 iteration 3 n succ[n] gen[n] kill[n] out[n] in[n] out[n] in[n] out[n] in[n] out[n] in[n] 6 - c - - - - c - c - c 5 2,6 a - - - c a,c a,c a,c a,c a,c 4 5 b a - - a,c b,c a,c b,c a,c b,c 3 4 b,c c - - b,c b,c b,c b,c b,c b,c 2 3 a b - - b,c a,c b,c a,c b,c a,c 1 2 - a - - a,c c a,c c a,c c Note that this in example, no more than two of {a,b,c} are ever simultaneously live, so two registers will suffice to hold these variables at all times.