Optimizations
=============
First, choose a suitable intermediate language for describing code.
Assume an infinite number of temporary registers. Also, assume
local variables and arguments are already in registers to start with.
Describe code using 3-address code (similar to textbook's ILOC)
Instruction set (where a,b,c, are registers or constants)
a <- b bop c Binary operation (for any binary operator bop)
a <- b Move
a <- M[b] Memory fetch (from address b)
M[a] <- b Memory store (to address a)
L: Label
goto L Unconditional branch
if a relop b goto L Conditional branch (for any relational operator relop)
a <- f(a1,...,an) Function call (where f is a fixed label or a computed address)
Note that this language is "lower-level" than JVM bytecodes in most respects, e.g.,
it exposes all address arithmetic needed for accessing array elements or object fields.
But it is "higher-level" in a few ways, e.g., function call arguments are explicitly listed
rather than being placed in a standard location on the stack.
Local Value Numbering
- A simple optimization that works on straight-line code
Source fragment:
w = (x+y) + (u-v);
u = x + y;
x = u - v;
Corresponding bytecode (more or less) and possible JIT output:
push x
push y
add add rx,ry,r0
push u
push v
sub sub ru,rv,r1
add
store w add r0,r1,rw
push x
push y
add
store u add rx,ry,ru
push u
push v
sub
store x sub ru,rv,rx
Code in our intermediate 3-address code:
g <- x + y
h <- u + v
w <- g + h
u <- x + y
x <- u - v
Value Numbering:
Process each 3-addr instruction in order. Maintain a mapping from identifiers (x) and
binop expressions (left,op,right) to value numbers. Whenever an entry already,
rewrite the instruction to use it.
Initial code Final code Mapping entries
g <- x + y g <- x + y x -> 1 1:x
y -> 2 2:y
(1,+,2) -> 3
g -> 3 3:g
h <- u - v h <- u - v u -> 4 4:u
v -> 5 5:v
(4,-,5) -> 6
h -> 6 6:h
w <- g + h w <- g + h (3,+,6) -> 7
w -> 7 7:w
u <- x + y u <- g u -> 3
x <- u - v x <- u - v (3,-,5) -> 8
x -> 8 8:x
Alternatively, could build DAG showing relationships between entries.
Issues with Naming:
- If there are (re-)assignments, a name is not the same thing as a value!
- Value numbering successfully distinguishes between different values with
the same name (e.g., u in the example).
- But can still lose access to a value if its name gets overwritten.
Modified Example:
Initial code Final code Mapping entries
z <- x + y z <- x + y x -> 1 1:x
y -> 2 2:y
(1,+,2) -> 3
z -> 3 3:z
h <- u - v h <- u - v u -> 4 4:u
v -> 5 5:v
(4,-,5) -> 6
h -> 6 6:h
z <- g + h z <- z + h (3,+,6) -> 7
z -> 7 7:z 3:??
u <- x + y u <- ??
What to do? Rename variables so that every assignment gets a unique name.
Initial code Renamed code Final code Mapping entries
z <- x + y z0 <- x0 + y0 z0 <- x0 + y0 x0 -> 1 1:x0
y0 -> 2 2:y0
(1,+,2) -> 3
z0 -> 3 3:z0
h <- u - v h0 <- u0 - v0 h0 <- u0 - v0 u0 -> 4 4:u0
v0 -> 5 5:v0
(4,-,5) -> 6
h0 -> 6 6:h0
z <- g + h z1 <- z0 + h0 z1 <- z0 + h0 (3,+,6) -> 7
z1 -> 7 7:z1
u <- x + y u1 <- x0 + y0 u1 <- z0 u1 -> 3
What about control flow joins?
Initial code Renamed code
if a > 0 goto L1 if a0 > 0 goto L1
b = x + y b0 = x0 + y0
goto L2 goto L2
L1: b = x - y L1: b1 = x0 - y0
L2: c = a + b L2: c0 = a0 + ??
Will consider this shortly in context of SSA form.
Control Flow Graph (CFG)
Simple form: one node per instruction. Edge from node a to node b if there is any possibility
of control flowing directly from a to b.
Example program (Appel, "Modern Compiler Implementation," example 17.3)
1 a <- 5
2 c <- 1
3 L1: if c > a goto L2
4 c <- c + c
5 goto L1
6 L2: a <- c - a
7 c <- 0
CFG: The nodes correspond to instrs 1 through 7, with edges: 1->2, 2->3, 3->4, 3->6, 4->5, 5->3, 6->7
Often useful to factor a program into "basic blocks," corresponding to sequences of
"straight-line code." A basic block is a sequence of consecutive instrs in which control
always enters from the top and exits from the bottom. In this example, the basic
blocks are {1,2},{3},{4,5},{6,7}.