SSA Form:
Static Single Assignment (SSA) Form.
 Every variable has one (static) definition (though defn. may be executed many times).
 For straightline code, this is just like value numbering:
Original Code:
v < 4
w < v + 5
v < 6
w < v + 7
Code in SSA:
v1 < 4
w1 < v1 + 5
v2 < 6
w2 < v2 + 6
 For general flow, must introduce phinodes. These are fictional operations, (usually) not
intended to have exeuction significance. To interpret these nodes, must view code as CFG,
with the inedges to any node have welldefined order.
Example 1:
Original CFG:

 P? 

/ \
/ \
V V
 
 v<4   v<5 
 
\ /
\ /
V V

 w < v + v 

CFG in SSA form:

 P? 

/ \
/ \
V V
 
 v1<4   v2<5 
 
\ /
\ /
V V

 v3 < phi(v1,v2) 
 w1 < v3 + v3 

Example 2:
Original CFG:

 i < 0 
 j < 0 

 
  
V V 
 
 i > N ?  
 
/\ 
/ \ 
EXIT \ 
\ 
 
 j < j + i  
 i < j + 1  
 
 

CFG in SSA form:

 i1 < 0 
 j1 < 0 

 
1   2 
V V 
 
 i2 = phi(i1,i3)  
 j2 = phi(j1,j3)  
 i2 > N ?  
 
/\ 
/ \ 
EXIT \ 
\ 
 
 j3 < j2 + i  
 i3 < j2 + 1  
 
 

Where should we put phi assignments, and for which variables? Simple answer: in every join node,
for every variable in the program. Too expensive!
(Suffices to put a phi assignment for x in join nodes that aren't dominated by a single definition of x.
More details later.)
Larger Scopes for Value Numbering:
 Assume unique names via SSA form.
1. "Superlocal." Do analysis over paths in extended basic blocks. An extended basic block has one entry,
but can have multiple exits. It forms a subtree of the CFG: the root block may have multiple predecessors;
the other blocks have a unique predecessor, which is inside the EBB.
2. "Dominatorbased." Do the analysis over paths in dominator tree.
Aside on dominators:
Assume the CFG has a distinguished start node S, and has no disconnected subgraphs (nodes unreachable from S).
Then we define that node d *dominates* node n if all paths from S to n include d.
In particular, for all n, n dominates n.
Fact: d dominates n iff d = n or d dominates all predecessors of n.
So, the set D[n] of nodes that dominate n can be defined as:
D[S] = { S }
D[n] = { n } U (intersection over all p in pred[n] of D[p])
(where pred[n] = set of predecessors on n in CFG)
Define the *immediate dominator* of n, idom(n) as follows:
(1) idom(n) dominates n
(2) idom(n) is not n
(3) idom(n) does not dominate any other dominator of n (except n itself)
Fact: every node (except S) has a unique immediate dominator.
Hence the immediate dominator relation defined a tree, called the *dominator tree*,
whose nodes are the nodes of the CFG, where the parent of a node is its immediate dominator.
Have D[n] = {n} U (descendents of n in dominator tree)
Fact: The dominator tree of a CFG can be computed in almostlinear time.
(See textbook Ch. 9 for details.)
Even with dominators, cannot find redundant expressions computed on *different* paths.
A different approach: compute *available expressions*.
This is a classic *data flow analysis* problem.
In the SSA context, an expression is *available* at instr n if
it is computed at least once on *every* path from the entry node to n.
Application: If an expression is available at a node where it is being recomputed,
it is possible to replace the recomputation by
a variable representing the result of the previous computation.
To compute available expressions we can solve the following dataflow equations:
gen["t < b bop c"] = {b bop c}
gen[other] = {}
in[s] = intersection over all p in pred[s] of out[p]
out[s] = in[s] U gen[s]
We're interested in computing in[s]; at the moment, we'll just wave
our hands to do this.
Example (Muchnick, "Advanced Compiler Design & Implementation", Fig. 13.8):

ENTRY



V
A 
c1 < a1 + b1
d1 < a1 * c1
e1 < d1 * d1
i1 < 1

 _______________________
  
V V 
B  
i1 = phi(i1,i2) 
i3 = phi(c1,c2) 
f[i3] < a1 + b1 
c2 < c3 * 2 
c2 > d1 ? 
 
 Y N  
  
V V 
C  D  
g[i3] < a1 * c2 g[i3] < d1 * d1 
  
  
  
V V 
E  
i2 < i3 + 1 
i2 > 10? 
 
 Y N  
  
V 

EXIT

We have
gen[A] = {a1+b1,a1*c1,d1*d1}
gen[B] = {a1+b1,c3*2}
gen[C] = {a1*c2}
gen[D] = {d1*d1}
gen[E] = {i3+1}
Here's a solution (the maximal one, which is what we want)
in[A] = {} out[A] = {a1+b1,a1*c1,d1*d1}
in[B] = {a1+b1,a1*c1,d1*d1} out[B] = {a1+b1,a1*c1,d1*d1,c3*2}
in[C] = {a1+b1,a1*c1,d1*d1,c3*2} out[C] = {a1+b1,a1*c1,d1*d1,c3*2,a1*c2}
in[D] = {a1+b1,a1*c1,d1*d1,c3*2} out[D] = {a1+b1,a1*c1,d1*d1,c3*2}
in[E] = {a1+b1,a1*c1,d1*d1,c3*2} out[E] = {a1+b1,a1*c1,d1*d1,c3*2,i3+1}
So recomputations of a1+b1 in B and d1*d1 in D can be removed.
Here's another solution (a less useful one):
in[A] = {} out[A] = {a1+b1,a1*c1,d1*d1}
in[B] = {a1+b1} out[B] = {a1+b1,c3*2}
in[C] = {a1+b1,c3*2} out[C] = {a1+b1,c3*2,a1*c2}
in[D] = {a1+b1,c3*2} out[D] = {a1+b1,c3*2,d1*d1}
in[E] = {a1+b1,c3*2} out[E] = {a1+b1,c3*2,i3+1}
Note importance of taking "optimistic" view of in[B].