Optimizations
=============

First, choose a suitable intermediate language for describing code.

Assume an infinite number of temporary registers. Also, assume
local variables and arguments are already in registers to start with.

Describe code using 3-address code (similar to textbook's ILOC)
Instruction set (where a,b,c, are registers or constants)

a <- b bop c			Binary operation (for any binary operator bop)
a <- b				Move
a <- M[b]			Memory fetch (from address b)
M[a] <- b			Memory store (to address a)
L:				Label
goto L				Unconditional branch
if a relop b goto L		Conditional branch (for any relational operator relop)
a <- f(a1,...,an)		Function call (where f is a fixed label or a computed address)

Note that this language is "lower-level" than JVM bytecodes in most respects, e.g., 
it exposes all address arithmetic needed for accessing array elements or object fields.
But it is "higher-level" in a few ways, e.g., function call arguments are explicitly listed
rather than being placed in a standard location on the stack.


Local Value Numbering 

- A simple optimization that works on straight-line code

Source fragment:

w = (x+y) + (u-v);
u = x + y;
x = u - v;

Corresponding bytecode (more or less) and possible JIT output:

push x	      		       
push y
add		add rx,ry,r0
push u
push v
sub		sub ru,rv,r1
add		
store w		add r0,r1,rw
push x	
push y
add		
store u		add rx,ry,ru
push u		
push v
sub		
store x		sub ru,rv,rx

Code in our intermediate 3-address code:

g <- x + y
h <- u + v
w <- g + h
u <- x + y
x <- u - v

Value Numbering:

Process each 3-addr instruction in order. Maintain a mapping from identifiers (x) and
binop expressions (left,op,right) to value numbers.  Whenever an entry already,
rewrite the instruction to use it.

Initial code	Final code    Mapping entries
g <- x + y	g <- x + y    x -> 1              1:x
			      y -> 2              2:y
			      (1,+,2) -> 3
			      g -> 3              3:g
h <- u - v	h <- u - v    u -> 4              4:u
		              v -> 5              5:v
			      (4,-,5) -> 6
			      h -> 6              6:h
w <- g + h      w <- g + h    (3,+,6) -> 7         
			      w -> 7              7:w
u <- x + y	u <- g	      u -> 3              
x <- u - v	x <- u - v    (3,-,5) -> 8        
		              x -> 8              8:x

Alternatively, could build DAG showing relationships between entries.

Issues with Naming:

- If there are (re-)assignments, a name is not the same thing as a value!

- Value numbering successfully distinguishes between different values with
the same name (e.g., u in the example).

- But can still lose access to a value if its name gets overwritten.

Modified Example:

Initial code	Final code    Mapping entries
z <- x + y	z <- x + y    x -> 1		  1:x
			      y -> 2              2:y
			      (1,+,2) -> 3
			      z -> 3              3:z
h <- u - v	h <- u - v    u -> 4              4:u
		              v -> 5              5:v
			      (4,-,5) -> 6
			      h -> 6              6:h
z <- g + h      z <- z + h    (3,+,6) -> 7        
                              z -> 7              7:z  3:??
u <- x + y	u <- ??	      

What to do?  Rename variables so that every assignment gets a unique name.

Initial code    Renamed code     Final code	  Mapping entries

z <- x + y	z0 <- x0 + y0    z0 <- x0 + y0    x0 -> 1          1:x0
			                          y0 -> 2	   2:y0
			                          (1,+,2) -> 3
			                          z0 -> 3	   3:z0
h <- u - v	h0 <- u0 - v0    h0 <- u0 - v0    u0 -> 4	   4:u0
		                                  v0 -> 5	   5:v0
			                          (4,-,5) -> 6
			                          h0 -> 6	   6:h0
z <- g + h      z1 <- z0 + h0    z1 <- z0 + h0    (3,+,6) -> 7
                                                  z1 -> 7	   7:z1
u <- x + y	u1 <- x0 + y0    u1 <- z0         u1 -> 3          

What about control flow joins?  

Initial code		   Renamed code

     if a > 0 goto L1	   if a0 > 0 goto L1
     b = x + y		   b0 = x0 + y0
     goto L2		   goto L2
L1:  b = x - y	       L1: b1 = x0 - y0
L2:  c = a + b	       L2: c0 = a0 + ??

Will consider this shortly in context of SSA form.

Control Flow Graph (CFG)

Simple form: one node per instruction.  Edge from node a to node b if there is any possibility
of control flowing directly from a to b.

Example program (Appel, "Modern Compiler Implementation,"  example 17.3)

1		a <- 5 
2		c <- 1
3	L1:	if c > a goto L2
4		c <- c + c
5		goto L1
6	L2:	a <- c - a
7		c <- 0
 
CFG: The nodes correspond to instrs 1 through 7, with edges: 1->2, 2->3, 3->4, 3->6, 4->5, 5->3, 6->7

Often useful to factor a program into "basic blocks," corresponding to sequences of
"straight-line code."  A basic block is a sequence of consecutive instrs in which control
always enters from the top and exits from the bottom.   In this example, the basic
blocks are {1,2},{3},{4,5},{6,7}.