Basic Blocks

This chapter outlines a two-pass algorithm for partitioning a sequence of machine instructions into Basic Blocks (BB). Pass 1 identifies block headers and exit points, while pass 2 allots each of the BBs so found a unique, arbitrary number. Numbering in sequential order is a plausible convention. By that convention, the physically first instruction will have BB number 1 associated with it. Zero or more subsequent instructions may have the same associated BB number, until a new BB starts, or the end of I is reached. If a new BB starts, its BB number will be 1 higher than the previous.

Input is a sequential stream of instructions I plus the index of the start instruction start. Output is an ordered sequence of BB numbers, one associated with each instruction in I.

The instruction indexed start effectively is the destination of an invisible branch, executed by the Operating System.

 

Problem of Basic Block Analysis

Partitioning instructions into Basic Blocks (BB) is a useful aid in numerous optimizations. Some optimizations are performed solely within one BB; these are called local optimizations. Others are performed across all BBs of a complete function; those are called global optimizations, simply because they go beyond the locality of one BB.

The essence of a BB, a sequence of 1 or more instructions with a unique entry point and a single exit point, is the absence of any control flow within the block. Hence it is easy within a BB to answer questions of the kind: is the subexpression x identical to the other x in this BB? Or: Since there are two assignments to object x in this BB, and no intermediate use, can the first assignment of x simply be eliminated?

 

Synopsis

 

Definitions

See also the section on Control Flow Graphs (cfg).

Basic Block (BB):

Sequence of 1 or more instructions with a unique header (entry point) and a single exit point. Note that the 2 don=92t need to be distinct.

Entry Point:

First instruction in a BB. Alias: Header, or leader. Antonym: Exit point.

Exit Point:

Last instruction in a BB. The exit point is not always a control transfer instruction; it may be a sequential operation that falls through to the next, but the next one may be labeled. Antonym: Entry point.

Extended Basic Block:

Sequence of 2 or more instructions with a unique entry point and two or more points of exit.

Global Optimization:

Optimization across multiple BBs, possibly all BBs of a complete function.

Label:

Imaginary or real mark at an instruction signifying that this is the destination of a control transfer instruction.

Local Optimization:

Optimization restricted to a single BB.

Start Instruction:

Instruction that is the destination from some outside branch, such as the Operation System.

 

Generation of Entry and Exit Points

Instruction @ pc

Generates implicitly or explicitly, at pc or destination

Abort

explicit exit point at pc; actually an end point in cfg

Return

explicit exit point at pc; actually an end point in cfg

Halt

explicit exit point at pc; actually an end point in cfg

Call

explicit exit point at pc

explicit entry point at destination

implicit exit point at destination - if exists

implicit entry point at pc + 1

Branch

explicit exit point at pc

explicit entry point at destination

implicit entry point at destination - if exists

Conditional branch

explicit exit point at pc

explicit entry point at destination

explicit entry point at pc + 1

implicit exit point at destination - if exists

All others

No entry/exit point generation; just fall through

 

General Description of Basic Block Analysis

A Basic Block is a the sequence of (1 or more )consecutive instructions, starting with a unique entry instruction (header, aka leader), ending with an exit instruction that leads to another BB or ends the whole program (e.g. a Halt instruction). Note that it is possible to have single-instruction BBs. Entry instructions are explicitly created by (being the destination of) branch and call destinations. They are also created by fall-though in the case of conditional branches. Exit instructions are created explicitly by branches and calls; or implicitly by the next instruction being a new entry (aka leader or header).

Instructions that end a BB include: branches, conditional branches, indexed branches (for branch tables, Case Statements), function calls, procedure calls, exceptions, halt, function return, and procedure return. The destination of a branch is called a leader, as that instruction leads a new BB. Also, the first instruction of the main function is a leader; it will be initiated by the surrounding system (jumped to by OS code). Note that statement labels, procedure entry and similar instructions do not create leaders (or end a BB). Instead, it is the respective call instruction for a procedure entry (penter) instruction and the branch for a label that creates the leaders. In fact, if a penter instruction exists in I that is never made a leader, the corresponding penter instruction and hence the complete associated procedure body are unreachable code.

At the end of the BB analysis each instruction in I must be either reached, unreachable, leader, or unreachable leader. The purpose of unreachable leaders is explained below. Unreachable code can be eliminated from I.

What are unreachable leaders for? The successor of an unconditional branch implicitly creates a new BB entry at the next instruction. However, this new BB is only live if another instruction transfers control to the place after the unconditional branch. If not, the complete BB is unreachable. To keep the counting of BBs intact, it is convenient to mark such places as unreachable leaders.

The algorithm to identify BBs is a simple two-pass process. Pass 1 identifies all reachable instructions in I and all leaders. Pass 2 allots bb numbers with each BB in sequential order from 1 to n.

 

Pass 1

1.) Initialize all instructions are unreachable. 2.) Mark instruction I[ start ] a leader; leader implies reachable. 3.) For each instruction I[ pc ] in I perform the following analysis:

Any other instruction: mark the successor of any non-control-flow instruction as reached. Rationale: If control ever reaches the current instruction at pc, then during execution there will be a fall-through, reaching the next one at pc+1. On the other hand, if the header is never reached, it will never be labeled a leader, and hence all code in that BB is unreachable, and the unreachable code is detectable.

Pass 2

Process in sequence all instructions pc in I, starting at 0. See, whether instruction I[ pc ] is reachable, a leader, or an unreachable leader. Check the opcode of each instruction I[ pc ]. For any operation I[ pc ].opcode do:

After checking the opcode of instruction I[ pc ], set that instruction's BB number to bb. Note that some instructions may be live, because they are referenced solely by instructions that themselves reside in unreachable code. Hence unreachable code removal is an iterative process that must proceed until no further instructions can be removable. This should be done, if you are not building a cfg, as the latter trivially detects unreachable code.

 

Example 1

Example 1: Factorial Function fact( arg )

function fact( arg : integer ) return integer is

begin -- fact

if arg < 1 then

return 1;

else

return fact( arg - 1 ) * arg;

end if;

end fact;

BB 1: -> pc 0: fenter fact 36, 32, -|

BB 1: pc 1: lt arg(0,-1), 1, R(0)

BB 1: pc 2: brf R(0), -|, Q(5)

BB 2: -> pc 3: fret fact 1, 32, 1

BB 3: -> pc 4: br -|, -|, Q(10)

BB 4: -> pc 5: subi arg(0,-1), 1, R(0)

BB 4: pc 6: param R(0), -|, -|

BB 4: pc 7: fcall fact 1, R(1), Q(0)

BB 5: -> pc 8: multi R(1), arg(0,-1), R(2)

BB 5: pc 9: fret fact 1, 32, R(2)

BB 6: -> pc 10: abort(FUNCTION_ABORT_EXC)

<><> Unreachable code at pc = 4, BB(3)

<><> Unreachable code at pc = 10, BB(6)

 

The unreachable code at instruction 4 is caused by the Then Clause being terminated via a Return Statement. The explicit transfer of control to then End If will therefore never happen. Similarly, the abort at instruction 10 in Example 1, meant to catch a fall-through to the function's end will never happen; the function return will prevent that.

Note that here we see an example, in which unreachable code (quad 10) is solely reached by an instruction in unreachable code, quad 4. Pass 1 of the BB algorithm will mark quad 10 erroneously a leader, and hence reached. The future unreachable code removal and subsequent Basic Block Analysis will render quad 10 also unreachable.

 

Example 2

Example 2: Mutual Recursion

procedure main is

procedure proc1 is

begin -- proc1

if false then

proc1;

end if;

end proc1;

procedure proc2 is

begin -- proc2

if false then

proc2;

else

proc1;

end if;

end proc2;

begin -- main

proc1;

end main;

BB 1: -> pc 0: penter proc1 4, -|, -|

BB 1: pc 1: brf false -|, Q(3)

BB 2: -> pc 2: pcall proc1 1, -|, Q(0)

BB 3: -> pc 3: pret proc1 0, -|, -|

BB 4: -> pc 4: penter proc2 4, -|, -|

BB 4: pc 5: brf false -|, Q(8)

BB 5: -> pc 6: pcall proc2 1, -|, Q(4)

BB 6: -> pc 7: br -|, -|, Q(9))

BB 7: -> pc 8: pcall proc1 1, -|, Q(0)

BB 8: -> pc 9: pret proc2 0, -|, -|

BB 9: -> pc 10: penter main 4, -|, -|

BB 9: pc 11: pcall proc1 0, -|, Q(0)

BB 10: ->pc 12: HALT

<><> Unreachable code: pc 4 BB(4) to pc 9 BB(8)

 

Since the penter instruction at pc = 4 for procedure proc2 is unreachable, the whole procedure, quads 4 through quad 9, is superfluous. Note that the Basic Block Analysis applied here does detect this.

 

Example 3

 Example 3: Three kinds of loop

 

Procedure main is

i : integer := 0; even, odd : integer := 0;

begin -- main

while i < 10 loop

if i Mod 2 = 0 then even := even + 1; else odd := odd + 1; end if;

i := i + 1;

end loop;

 

i := 1;

<< label >>

if i rem 2 = 0 then even := even + 1; else odd := odd + 1; end if;

i := i + 1;

if i < 10 then goto label; end if;

 

for inx in 1 .. 10 loop

if inx mod 2 = 0 then even := even + 1; else odd := odd + 1; end if;

end loop;

end main;

 

BB 1: -> pc 0: penter main 7, -|, -|

BB 1: pc 1: assign 0, -|, I(0,4)

BB 1: pc 2: assign 0, -|, even(0,5)

BB 1: pc 3: assign 0, -|, odd(0,6)

BB 2: -> pc 4: lt I(0,4), 10, R(0)

BB 2: pc 5: brf R(0), -|, Q(16)

BB 3: -> pc 6: modi I(0,4), 2, R(0)

BB 3: pc 7: brinz R(0), 0, Q(11)

BB 4: -> pc 8: addi even(0,5), 1, R(0)

BB 4: pc 9: assign R(0), -|, even(0,5)

BB 4: pc 10: br -|, -|, Q(13)

BB 5: -> pc 11: addi odd(0,6), 1, R(0)

BB 5: pc 12: assign R(0), -|, odd(0,6)

BB 6: -> pc 13: addi I(0,4), 1, R(0)

BB 6: pc 14: assign R(0), -|, I(0,4)

BB 6: pc 15: br -|, -|, Q(4)

BB 7: -> pc 16: assign 1, -|, I(0,4)

BB 8: -> pc 17: remi I(0,4), 2, R(0)

BB 8: pc 18: brinz R(0), 0, Q(22)

BB 9: -> pc 19: addi even(0,5), 1, R(0)

BB 9: pc 20: assign R(0), -|, even(0,5)

BB 9: pc 21: br -|, -|, Q(24)

BB 10: ->pc 22: addi odd(0,6), 1, R(0)

BB 10: pc 23: assign R(0), -|, odd(0,6)

BB 11: ->pc 24: addi I(0,4), 1, R(0)

BB 11: pc 25: assign R(0), -|, I(0,4)

BB 11: pc 26: lt I(0,4), 10, R(0)

BB 11: pc 27: brf R(0), -|, Q(29)

BB 12: ->pc 28: br -|, -|, Q(17)

BB 13: ->pc 29: start_for -|, -|, -|

BB 13: pc 30: subi 1, 1, inx(0,7)

BB 13: pc 31: assign 10, -|, anonymous(0,8)

BB 14: ->pc 32: for inx(0,7), anonymous(0,8), Q(41)

BB 15: ->pc 33: modi inx(0,7), 2, R(0)

BB 15: pc 34: brinz R(0), 0, Q(38)

BB 16: ->pc 35: addi even(0,5), 1, R(0)

BB 16: pc 36: assign R(0), -|, even(0,5)

BB 16: pc 37: br -|, -|, Q(40)

BB 17: ->pc 38: addi odd(0,6), 1, R(0)

BB 17: pc 39: assign R(0), -|, odd(0,6)

BB 18: ->pc 40: br -|, -|, Q(32)

BB 19: ->pc 41: end_for -|, -|, -|

BB 19: pc 42: HALT

 

Example 4

Example 4: Nested Case Statements

procedure main is

c : character := 'n';

i : integer := 109;

begin -- main

case c is

when 'a' .. 'c' => return;

when 'e' .. 'f' | 'g' | 'i' .. 'j' => i := 1;

when others =>

case i is

when 108 .. 109 => return;

end case;

end case;

end main;

BB 1: -> pc 0: penter main 6, -|, -|

BB 1: pc 1: assign 'n', -|, c(0,4)

BB 1: pc 2: assign 109, -|, i(0,5)

BB 1: pc 3: br -|, -|, Q(22)

BB 2: -> pc 4: pret main 0, -|, -|

BB 3: -> pc 5: br -|, -|, Q(38))

BB 4: -> pc 6: assign 1, -|, i(0,5)

BB 4: pc 7: br -|, -|, Q(38)

BB 5: -> pc 8: br -|, -|, Q(11)

BB 6: -> pc 9: pret main 0, -|, -|

BB 7: -> pc 10: br -|, -|, Q(21)

BB 8: -> pc 11: lt i(0,5), 108, R(0)

BB 8: pc 12: brf R(0), -|, Q(14))

BB 9: -> pc 13 abort(CASE_RANGE_LOW_EXC)

BB 10:-> pc 14: gt i(0,5), 109, R(0)

BB 10: pc 15: brf R(0), -|, Q(17)

BB 11:-> pc 16 abort(CASE_RANGE_HIGH_EXC)

BB 12:-> pc 17: eq i(0,5), 108, R(0)

BB 12: pc 18: brt R(0), -|, Q(9))

BB 13:-> pc 19: eq i(0,5), 109, R(0)

BB 13: pc 20: brt R(0), -|, Q(9))

BB 14:-> pc 21: br -|, -|, Q(38)

BB 15:-> pc 22: lt c(0,4), 97, R(0)

BB 15: pc 23: brt R(0), -|, Q(8)

BB 16:-> pc 24: gt c(0,4), 106, R(0)

BB 16: pc 25: brt R(0), -|, Q(8)

BB 17:-> pc 26: subi c(0,4), 97, R(0)

BB 17: pc 27: brx R(0), 10, .

BB 18: -> pc 28: br -|, -|, Q(4) Q(4)

BB 19: -> pc 29: br -|, -|, Q(4) Q(4)

BB 20: -> pc 30: br -|, -|, Q(4) Q(4)

BB 21: -> pc 31: br -|, -|, Q(8) Q(8)

BB 22: -> pc 32: br -|, -|, Q(6) Q(6)

BB 23: -> pc 33: br -|, -|, Q(6) Q(6)

BB 24: -> pc 34: br -|, -|, Q(6) Q(6)

BB 25: -> pc 35: br -|, -|, Q(8) Q(8)

BB 26: -> pc 36: br -|, -|, Q(6) Q(6)

BB 27: -> pc 37: br -|, -|, Q(6) Q(6)

BB 28: -> pc 38: HALT

 

References

  1. Aho, A. et al.: Compilers Principles, Techniques, and Tools, Addison-Wesley Publishing Co., ©1986 or newer, ISBN 0-201-10088-6.