CS302 Spr'99 Lecture Notes
Lecture 11
Code Optimization

($\bullet$ Really ``improvement'' rather than ``optimization;'' results are seldom optimal.)

$\bullet$ Remove inefficiencies in user code and (more importantly) in compiler-generated code.

$\bullet$ Can be applied at several levels, chiefly intermediate or assembly code.

$\bullet$ Can operate at several levels:

- ``Peephole'' : very local IR or assembly

- ``Local'' : within basic blocks

- ``Global'' : entire procedures

- ``Interprocedural'' : entire programs (maybe even multiple source files)

$\bullet$ Theoretical tools: graph algorithms, control and data flow analysis.

$\bullet$ Practical tools: few.

$\bullet$ Most of a serious modern compiler is devoted to optimization.

Peephole Optimizations

$\bullet$ Look at short sequences of statements (in IR or assembly code)

$\bullet$ Correct inefficiencies produced by excessively local code generation strategies.

$\bullet$ Repeat!

$\bullet$ Redundant instructions fmovd fmovd $\bullet$ Unreachable code

\begin{code}LOOP IF x > 2 THEN EXIT ELSE X := X + 1 END;
\par L1: IF X > 2 GOTO ...
GOTO L1 ; never executed
L3: X := X + 1
L4: ...\end{code}
$\bullet$ Flow-of-control fixes: remove jumps to jumps, e.g.,

\begin{code}L1: IF X > 2 GOTO L4
X := X + 1
L4: ...\end{code}
More Peephole Optimizations

$\bullet$ Algebraic Simplification
\begin{code}x + 0 = 0 + x = x
x - 0 = x
x * 1 = 1 * x = x
x/1 = x\end{code}

$\bullet$ Strength Reduction

Target hardware may have cheaper ways to do certain operations.

E.g., multiplication or division by a power of 2 is better done by shifting. umul $\bullet$ Use of machine idioms

Target hardware may have quirks/features that make certain sequences faster: set 372, add Local (Basic Block) Optimizations

$\bullet$ Typically applied to IR, after addressing is made explicit, but before machine dependencies appear.

$\bullet$ Most important: Common Subexpression Elimination (CSE)

\begin{code}i := j + 1
a[i] := a[i] + j + 1\end{code}
Avoid duplicating the code for j+1 or the addressing code for a[i].

$\bullet$ Copy Propagation
a := b + 1 $\Rightarrow$\space a := b + 1
c := a c := a ; maybe can now omit
d := c d := a\end{code}
$\bullet$ Algebraic Identities

E.g., use associativity and commutativity of +
a := b + c $\Rightarrow$\space a := b + c
b := c + d + b b := b + c + d ; now use CSE\end{code}
$\bullet$ Iterate! Optimizations enable further optimizations.

$\bullet$ Primary technique: build directed acyclic graph (DAG) for basic block.

CSE Example

\begin{code}Source: i := j + 1
a[i] := b[i] + j + 1
\par Naive IR: After CSE:
t23 := t18 + t22 t23 := t18 + t10 ; &(a[i])
*t23 := t17 *t23 := t17\end{code}
Global (Full Procedure) Optimization

Loop optimizations are most important.

$\bullet$ Code motion: ``hoist'' expensive calculations above the loop.

$\bullet$ Use induction variables and reduction in strength. Change only one index variable on each loop iteration, and choose one that's cheap to change.

Also continue to apply CSE, copy propagation, dead code elimination, etc. on global scale.

Based on flow graph:

$\bullet$ nodes are basic blocks

$\bullet$ edge from bb A to bb B if B can be executed immediately after A.

Example: Computing dot product (assuming i,a local; b,c global). Local CSE already performed within basic blocks.

\begin{code}a = 0;
for (i = 0; i < 20; i++)
a = a + b[i] * c[i];
return a;
IR for Dot Product

\begin{code}B1 t1 := const 0
t2 := addr a
*t2 := t1
t3 := addr i
*t3 := t...
t22 := const 1 return t25
t23 := t12 + t22
*t11 := t23
goto L2\end{code}
Example: effects of global optimization

$\bullet$ Promote locals a and i to registers.

$\bullet$ Induction variable: replace i with i*4, thus reducing strength of per-loop operation; adjust test accordingly.

$\bullet$ Hoist all constants out of loop.

\begin{code}t1 := const 0 t1 := const 0
t2 := addr a
*t2 := t1 t9 := t1 ; a
t3 ...
t7 := const 80
t8 := addr a
t10 := addr b
t17 := addr c

\begin{code}L2: L2:
t5 := addr i
t6 := *t5
t7 := const 20
if t6 >= t7 goto L4 ...
...goto L2
\par L4: L4:
t24 := addr a
t25 := *t24
return t25 return t9

Andrew P. Tolmach