Optimizations (See Appel, "Modern Compiler Implementation") First, choose a suitable intermediate language for describing code. Assume an infinite number of temporary registers. Also, assume local variables and arguments are already in registers to start with. Describe code using "quads" Instruction set (where a,b,c, are registers or constants) a <- b bop c Binary operation (for any binary operator bop) a <- b Move a <- M[b] Memory fetch (from address b) M[a] <- b Memory store (to address a) L: Label goto L Unconditional branch if a relop b goto L Conditional branch (for any relational operator relop) a <- f(a1,...,an) Function call (where f is a fixed label or a computed address) Note that this language is "lower-level" than JVM bytecodes in most respects, e.g., it exposes all address arithmetic needed for accessing array elements or object fields. But it is "higher-level" in a few ways, e.g., function call arguments are explicitly listed rather than being placed in a standard location on the stack. Local Value Numbering - A simple optimization that works on straight-line code Source fragment: w = (x+y) + (u-v); u = x + y; x = u - v; Corresponding bytecode (more or less) and "Version 3" JIT output: push x push y add add rx,ry,r0 push u push v sub sub ru,rv,r1 add add r0,r1,r0 store w mov r0,rw push x push y add add rx,ry,r0 store u mov r0,ru push u push v sub sub ru,rv,r0 store x mov r0,rx Code in our intermediate language of quads: g <- x + y h <- u + v w <- g + h u <- x + y x <- u - v Value Numbering: Process each quad in order. Maintain a mapping from identifiers (x) and binop expressions (left,op,right) to value numbers. Whenever an entry already, rewrite the quad to use it. Initial quads Final quads Mapping entries g <- x + y g <- x + y x -> 1 y -> 2 (1,+,2) -> 3 g -> 3 h <- u - v h <- u + v u -> 4 v -> 5 (4,-,5) -> 6 h -> 6 w <- g + h w <- g + h (3,+,6) -> 7 u <- x + y u <- g u <- 3 x <- u - v x <- u - v (3,-,5) -> 8 x -> 8 Can also draw DAG showing relationships between entries. Control Flow Graph (CFG) Simple form: one node per quad. Edge from node a to node b if there is any possibility of control flowing directly from a to b. Example program: 1 a <- 5 2 c <- 1 3 L1: if c > a goto L2 4 c <- c + c 5 goto L1 6 L2: a <- c - a 7 c <- 0 CFG: The nodes correspond to quads 1 through 7, with edges: 1->2, 2->3, 3->4, 3->6, 4->5, 5->3, 6->7 Often useful to factor a program into "basic blocks," corresponding to sequences of "straight-line code." A basic block is a sequence of consecutive quads in which control always enters from the top and exits from the bottom. In this example, the basic blocks are {1,2},{3},{4,5},{6,7}. Dataflow problem: Reaching Definitions Question: Given a program, which variable definitions are relevant to (or "visible" by, or "reach") which variable uses? This is relevant for, e.g., constant propagation. If exactly one definition of x reaches a particular use of x, and that definition assigns a constant to x, then that use of x can be replaced by the constant. E.g., in example above, the only definition of a that reaches quad 3 is the one in quad 1, so we can replace a with 5 in quad 3. Formalisation: A *definition* of t is a quad of these forms: t <- a bop b t <- u t <- M[a] A *use* of t is a quad where t appears as the operand to a bop, relop, M[], function argument, or right-hand side of the <-. A definition d of t *reaches* a use u if there exists some path from d to u that doesn't redefine t at some internal point. Can compute sets of reaching definitions via *dataflow equations*. For variable t, define def[t] = set of quads that define t For each quad n, define gen[n] = if n is a definition then {n} else {} kill[n] = if n is a definition of t then defs(t) - {n} else {} pred[n] = set of predecessor nodes of n in CFG The sets of definitions that reach the beginning (in) and end (out) of each quad n can be defined as in[n] = union over all p in pred[n] of out[p] out[n] = gen[n] U (in[n] - kill[n]) Can solve by iteration, starting with empty sets, until nothing changes; get *least fixed point* of recursive equations. For example: iteration 0 iteration 1 iteration 2 iteration 3 n pred[n] gen[n] kill[n] in[n] out[n] in[n] out[n] in[n] out[n] in[n] out[n] 1 - 1 6 - - - 1 - 1 - 1 2 1 2 4,7 - - 1 1,2 1 1,2 1 1,2 3 2,5 - - - - 1,2 1,2 1,2,4 1,2,4 1,2,4 1,2,4 4 3 4 2,7 - - 1,2 1,4 1,2,4 1,4 1,2,4 1,4 5 4 - - - - 1,4 1,4 1,4 1,4 1,4 1,4 6 3 6 1 - - 1,2 2,6 1,2,4 2,4,6 1,2,4 2,4,6 7 6 7 2,4 - - 2,6 6,7 2,4,6 6,7 2,4,6 6,7 This confirms that the only defn of that reaches quad 4 is the one in quad 1.