Another Available Expressions Example Example: 1 g <- x + y 2 i <- x - y 3 L: r <- x + y 4 s <- x - y 5 x <- x + 1 6 h <- x + y 7 if x < 10 goto L In SSA: 1 g1 <- x1 + y1 2 i1 <- x1 - y1 3 L: x2 <- phi(x1,x3) 3a r1 <- x2 + y1 4 s1 <- x2 - y1 5 x3 <- x2+ 1 6 h1 <- x3 + y1 7 if x3 < 10 goto L Here we can't find any expressions to remove. But classically, available expressions are defined on the original CFG, not an SSA form. I.e., we make no assumptions about unique names. For this to work, we must add "kill" sets to remove available expressions involving variables that may have been updated. gen[s] = expression computed by s kill[s] = expressions whose values are changed by s s gen[s] kill[s] t <- b bop c {b bop c} - kill[s] expressions containing t t <- M[b] {M[b}} - kill[s] expressions containing t t <- b {} expressions containing t M[a] <- b {} expressions of form M[q] for *any* q t <- f(a1,..an) {} expressions containing t or of form M[q] for *any* q Note: The severe kill clauses for memory accesses are due to the possibility of *aliasing*. We might think that M[a] <- b should only kill M[a]. But suppose we have two memory pointer variables a and b, with a = b. Then after the sequence t <- M[a] M[b] <- u M[a] is no longer available! The clauses above are appropriate if we know *nothing* about the aliasing behavior of the program, i.e., it treats the entire memory as a single variable. If we know more, via an *alias analysis*, we can kill fewer expressions. Here are the dataflow expressions for available expressions at the beginning (in) and end (out) of each node. Let pred[n] = set of predecessors of n in CFG. in[n] = intersection over all p in pred[n] of out[p] out[n] = gen[n] U (in[n] - kill[n]) Solving these, we discover that x+y is available at node 3, because it has already been computed, either in node 1 (as g) or in node 6 (as h), and because the only redefinition of x or y is in node 5, which precedes node 6. On the other hand, x-y is not available at node 4, because if control comes to node 4 from the bottom of the loop, it has not been recomputed since x changed. To avoid recomputation, we introduce a new variable, say p, and modify the program as follows: p <- x + y g <- p i <- x - y L: r <- p s <- x - y x <- x + 1 p <- x + y h <- p if x < 10 goto L In many cases, the copy instrs can be removed by later optimization passes. The SSA-based version of available expressions is weaker than the original version in this case. Can see that in SSA version, the only expressions available within a loop will be those defined above the loop entry. Value Partitioning: One last look at redundancy elimination On the other hand, SSA sometimes let's us do very clever things based on the fact that a phi function, although non-deterministic per entry, behaves the same way on all the variables. Example (Muchnick, Fig. 12.18): ------- ENTRY ------- | | V -------- B1 i <- 1 j <- 1 -------- | |---------| | | V V | ---------------- B2 | i mod 2 == 0 ? | ---------------- | | Y N | | | | | V V | ------------ B3 ------------ B4 | i <- i + 1 i <- i + 3 | j <- j + 1 j <- j + 3 | ------------ ------------ | | | | | | | V V | --------- B5 | j <= C | --------- | | Y | N | | | | | V ------------- ------ EXIT ------ In SSA FORM: ------- ENTRY ------- | | V --------- i1 <- 1 j1 <- 1 --------- | |---------| | | V V | ------------------ | i3 = phi2(i2,i1) | j3 = phi2(j2,i1) | i3 mod 2 == 0 ? | ------------------ | | Y N | | | | | V V | -------------- -------------- | i4 <- i3 + 1 i5 <- i3 + 3 | j4 <- j3 + 1 j5 <- j3 + 3 | -------------- -------------- | | | | | | | V V | ------------------- | i2 <- phi5(i4,i5) | j2 <- phi5(j4,j5) | j2 <= C | ------------------- | | Y | N | | | | | V ------------- ------ EXIT ------ A *value graph* for a procedure is a labeled DAG whose nodes are labeled with operators, function symbols, or constants and whose edges point from an operator or function to its operands (the edges are labeled with the operand position number). Nodes are named by their SSA names (or by an aribrary name, if none). [See Muchnick example (p. 351) for above] *Congruence* is defined as the maximal relation on the value graph such that two nodes are congruent if (1) they are the same node, or (2) their labels are equal constants, or (3) their labels are equal operators and their operands are congruent. Two variables are *equivalent* at a point p in a program if they are congruent and their defining assignments dominate p. Second and subsequent equivalent variables can be removed. Compute congruence as maximum fixed point of a partitioning process on the value graph. Initially, assume all nodes with same label are congruent; repeatedly partition congruence classes as necessary until a fixed point is reached. In this example, end up with i and j nodes congruent at all program points. Solving Dataflow equations Completely general method: Iteration to a fixed point. Recall available expressions problem. To solve these equations we must start with the optimistic assumption that all the in and out sets are *full* (contain all possible expressions), and then iterate to reach a *greatest fixed point*, i.e., the largest sets that solve the equations. (If we started with empty sets, we'd never get anywhere, due to the intersection operation in in[].) Recall first example in lecture. Let A = the set of all potentially interesting expressions, namely {x+y,x-y,x+1}. iteration 0 iteration 1 iteration 2 iteration 3 n pred[n] gen[n] kill[n] in[n] out[n] in[n] out[n] in[n] out[n] in[n] out[n] 1 - x+y - - A - x+y - x+y - x+y 2 1 x-y - A A x+y x+y,x-y x+y x+y,x-y x+y x+y,x-y 3 2,7 x+y - A A x+y,x-y x+y,x-y x+y x+y x+y x+y 4 3 x-y - A A x+y,x-y x+y,x-y x+y x+y,x-y x+y x+y,x-y 5 4 - A A A x+y,x-y - x+y,x-y - x+y,x-y - 6 5 x+y - A A - x+y - x+y - x+y 7 6 - - A A x+y x+y x+y x+y x+y x+y