CS302 Spr'99 Lecture Notes
Lecture 10
Machine Code Generation

$\bullet$ Instruction Selection

$\bullet$ Register Allocation and Assignment

$\bullet$ Optimization

Issues:

$\bullet$ Complexity of Target Machine

$\bullet$ Level of Translation: expression, statement, basic block, routine, program?

$\bullet$ Management of Scarce Resources

Approaches to Instruction Selection

For RISC targets, translate one IR instruction to one or more target instructions.

For CISC targets, translate several IR instructions to one target instruction.


\begin{code}Example Source: a := b (assuming a,b in frame)
\par 3-addr IR: t1 = ...
... %
ld [%
add %
st %
\par moderate RISC: ld [%
st %
\par CISC: move [%
\end{code}

Simplistic SPARC Instruction Selection for PCAT

$\bullet$ Generate instructions directly from AST, using 3-address style.

$\bullet$ Include explicit code for array and record calculations.

$\bullet$ (Alternatively, could do one-to-one translation of IR.)

$\bullet$ Take advantage of SPARC's M[reg+const] addressing mode to generate good code for frame references.

DO THIS: ld [

NOT THIS: add ld [

$\bullet$ Use (small) constants directly where possible.

DO THIS: add

NOT NOT: mov 42, add

$\bullet$ Fill delay slots with nop's, unless producing a ``canned'' sequence that can use them.

Register Allocation and Assignment

Task: Manage scarce resources (registers) in environment with imperfect information (static program text) about dynamic program behavior.

General aim is to keep frequently-used values in registers as much as possible, to lower memory traffic. Can have a large effect on program performance.

Variety of approaches are possible, differing in sophistication and in scope of analysis used.

Allocator may be unable to keep every ``live'' variable in registers; must then ``spill'' variables to memory. Spilling adds new instructions, which often affects the allocation analysis, requiring a new iteration.

If spilling is necessary, what should we spill? Some heuristics:

$\bullet$ Don't spill variables used in inner loops.

$\bullet$ Spill variables not used again for ``longest'' time.

$\bullet$ Spill variables which haven't been updated since last read from memory.

Simplistic Register Management for PCAT

$\bullet$ Assume variables ``normally'' live in memory.

$\bullet$ Fetch values into registers just before (each) use in an expression.

$\bullet$ Register use never spans statements.

$\bullet$ Use auxiliary table to track register use.


\begin{code}static int reg_used[32];
int getreg(); /* returns first free reg */
void freereg(int r); /* frees specified reg */\end{code}

$\bullet$ Certain SPARC registers are reserved.

$\bullet$ Remember that same register can be used as source and target:
\begin{code}/* assume source operands in t1,t2 */
freereg(t1);
freereg(t2);
int t3 = getreg();
printf(''add \\ \%r\%d,\\ \%r\%d,\\ \%r\%d'', t1, t2, t3);\end{code}
might produce add %r3,%r5,%r3

$\bullet$ Ignore possibility of spills. Register Allocation for Expressions

Choice of evaluation order can affect number of registers needed.
\begin{code}Example: (a+b) - ((c+d) - (e+f)) -
/ \\
/ \\
+ -
/ \\ / \\
a b / \\
+ +
/ \\ / \\
c d e f \end{code}
If we compute left child first, need 4 regs, but doing right child first needs only 3.
\begin{code}load a,r1 load c,r1
load b,r2 load d,r2
add r1,r2,r1 add r1,r2,r1 ...
...,r4,r3 load b,r3
sub r2,r3,r2 add r2,r3,r2
sub r1,r2,r1 sub r2,r1,r1\end{code}
Minimizing Registers Needed to Evaluate Expression Trees

Key idea (Sethi & Ullman): At each node, first evaluate subtree requiring largest number of registers to evaluate. Can then save result of this evaluation in a register while doing other subtree.

1. Label each node with minimum number of registers needed to evaluate subtree.


\begin{code}risc_label(t) {
if isLeaf(t) then
t->label = 1 {\it (depends on in...
...t->label + 1
else
t->label = max(t->left->label,
t->right->label)
}\end{code}

2. Use labels to guide order of code emission; emit code for higher-numbered subtree first.

3. If we run out of registers (when does this happen?), spill to temporary memory locations.

Sethi-Ullman Numbering Example
\begin{code}\cdmath
-$^3$ / \\
/ \\
+$^2$\space -$^3$ / \\ / \\
a$^1$\spa...
...\\
+$^2$\space +$^2$ / \\ / \\
c$^1$\space d$^1$ e$^1$\space f$^1$\end{code}

Other Issues in Tree Evaluation Order

Some machines (e.g., X86) allow one operand (e.g., left one) to be a complex expression, while the other must be a register, which also holds the result (``accumulator'' style):


\begin{code}add 37,r0 ; r0 <- r0 + 37
add b, r1 ; r1 <- r1 + b
sub r0,r1 ; r1 <- r1 - r0
\end{code}

$\bullet$ These machines have different Sethi-Ullman numbering, e.g., right leaves may require no registers at all.

$\bullet$ If we must spill registers, it is better to evaluate left child of non-commutative operators (like -,/) last, because they will require their left operand in a register anyhow.

Can use associativity to make trees ``less bushy, '' e.g.
\begin{code}\cdmath
+$^3$\space +$^2$ / \\ / \\
/ \\ becomes +$^2$\space d$^1...
...^1$ a$^1$\space b$^1$ c$^1$\space d$^1$\space / \\
a$^1$\space b$^1$\end{code}
Basic Blocks

$\bullet$ Extend analysis of register use to program units larger than expressions but still completely analyzable at compile time.

$\bullet$ Basic Block = sequence of instructions with single entry & exit.

$\bullet$ If first instruction of BB is executed, so is remainder of block (in order).

$\bullet$ To calculate basic blocks:

(1) Determine BB leaders ( $\rightarrow$) :

(a) First statement in routine
(b) Target of any jump (conditional or unconditional).
(c) Statement following any jump.

(What about subroutine calls?)

(2) Basic block extends from leader to (but not including) next leader (or end of routine).

Basic Block Example


\begin{code}prod := 0;
i := 1;
while i <= 20 do
prod := prod + a[i] * b[i];
i := i + 1
end\end{code}


\begin{code}\cdmath
$\rightarrow$\space 1. prod := 0
2. i := 1
$\rightarrow$\sp...
... := prod + t7
12. i := i + 1
13. goto 3
$\rightarrow$\space 14. ---
\end{code}

Register Assignment Within Basic Blocks

Idea: Let program variables and temporaries stay in registers as long as possible.


\begin{code}Example: Naive code: Better code:
a := b + c ld b,r0 ld b,r0
d :...
...r0
add r0,r1,r1 st r0,a
st r1,d
ld b,r0
add r0,1,r0
st r0,a \end{code}
$\bullet$ Simplest to operate on one basic block at a time; can extend to multiple blocks with some effort.

$\bullet$ Assume an infinite supply of registers; later ``spill'' some to memory if required.

$\bullet$ Registers behave like a cache for memory locations.

$\bullet$ Nasty problems for source-level debuggers and dump utilities - where is that variable?!?

Liveness

To determine how long to keep a given variable in a register, need to know the range of instructions for which the variable is live.

A variable is live immediately following an instruction if its current value will be needed in the future.

It's easy to calculate live ranges within a basic block, just by working backwards through the block.

Can assume that all user variables are live at the end of a basic block, i.e., that their values may be used in a subsequent block. (If doing BB-level register allocation, must save the values back in memory at the end of the block anyhow.)

Treat temporaries like user variables, except that they are assumed dead at end of BB.

Can improve accuracy by calculating liveness over entire routines, including control flow statements, not just BBs.

To do this requires iterative flow analysis and the result is only conservative approximation to true liveness.

BB Code Generation using Liveness

Can combine code generation with ``greedy'' register allocation: bring each variable into a register when first needed, and leave it there as long as it's needed (if possible).

Maintain register descriptors saying which variable is in each register, and address descriptors saying where (in memory and/or a register) each variable is. For each IR instruction x := y op z (other instructions similar):

1. If y isn't in a register, load it into a free one, updating descriptors.

2. Similarly for z.

3. If y and/or z are no longer live following this instruction, mark their registers as free.

4. Choose a free register for x, updating descriptors.

5. Generate instruction op ry,rz,rx.

For the special case x := y, load y into a register, if necessary, and then mark that register as holding x too.

Must now be careful not to free a register unless none of its associated variables is live.

Example


\begin{code}Source code: d := (a-b) + (a-c) + (a-c)
\par Live after inst:
a b ...
...:r1
\par d := v + u add r0,r1,r0 r0:d a,b,c:mem
st r0,d r1:v d:r0,mem\end{code}
Register Interference Graphs

Mixing instruction selection and register allocation gets confusing; need a more systematic way to look at the problem.

$\bullet$ Initially generate code assuming an infinite number of ``logical'' registers; calculate live ranges

Previous Example:
\begin{code}Live after instr.
ld a,t0 ; a:t0 t0
ld b,t1 ; b:t1 t0 t1
sub t0,t1...
...; u:t4 t2 t4
add t2,t4,t5 ; v:t5 t4 t5
add t5,t4,t6 ; d:t6 t6
st t6,d
\end{code}

$\bullet$ Build a register interference graph, which has

- a node for each logical register.

- an edge between two nodes if the corresponding registers are simultaneously live.

Coloring Interference Graphs

Interference Graph Example:


\begin{code}________________
/ \\
t0 ----- t1 t2
\vert ______________//
\vert / ______/
\vert/ /
t3 t4------ t5 t6\end{code}
A coloring of a graph is an assignment of colors to nodes such that no two connected nodes have the same color. (Like coloring a map, where nodes=countries and edges connect countries with common border.)

Suppose we have k physical registers available. Then aim is to color interference graph with k or fewer colors. This implies we can allocate logical registers to physical registers without spilling.

In general case, determining whether a graph can be k-colored is hard (N.P. Complete, and hence probably exponential).

But a simple heuristic will usually find a k-coloring if there is one.

Graph Coloring Heuristic

1. Choose a node with fewer than k neighbors.

2. Remove that node. Note that if we can color the resulting graph with kcolors, we can also color the original graph, by giving the deleted node a color different from all its neighbors.

3. Repeat until either

$\bullet$ there are no nodes with fewer than k neighbors, in which case we must spill; or

$\bullet$ the graph is gone, in which case we can color the original graph by adding the deleted nodes back in one at a time and coloring them.

Example:


\begin{code}________________
/ \\
t0 ----- t1 t2
\vert ______________//
\vert / ______/
\vert/ /
t3 t4------ t5 t6\end{code}
Finds a 3-coloring. There cannot be a 2-coloring (why not?).

Each ``color'' corresponds to a physical register, so 3 registers will do for this example.


Andrew P. Tolmach
1999-05-16