Strong Connectivity

This chapter outlines an algorithm by Robert Tarjan, ref [1], that identifies all strongly connected components in a directed graph G with entry point e. Strongly connected components (SCC) are those subgrahs of G in which every node can be reach from every node in the SCC. Thus, every node alone also forms a trivial SCC.

Problem Definition of Strongly Connected Component Analysis

A directed graph G is a static abstraction of a program p, in which each node of G represents a Basic Block (BB) of program p. Edges connecting the nodes in G represent the control flow from any one node n in G to other nodes. For many transformations it is important to identify loops in the program, which means identifying suitable subgraphs in G. A loop is a strongly connected component with additional restrictions. Tarjan's algorithm is an extremely simple and efficient method of finding all SCCs of G.

Synopsis

• Definitions
• SCC Analysis Method
• Pseudo Code of Tarjan's algorithm
• References

Definitions

Basic Block (BB):

Sequence of 1 or more instructions with a unique header (entry point) and a single exit point. Note that the 2 don=92t need to be distinct.

Control Flow Graph:

Directed graph whose nodes are BBs of a program. The directed edges indicate from which BB control can flow to which other BBs. One node in the cfg is special, named the start node, or entry node. This is the node reachable from outside the program.

Dominator:

A node d of G dominates a node n, also in G, if every path from e to n leads through d. Note that d and n need not be distinct.

Entry Node:

Start node of a graph G. Note that e dominates all other nodes in G.

Entry Point:

First instruction in a BB. Alias: Header, or leader. Antonym: Exit point.

Spanning Tree:

A spanning tree S of graph G is any tree containing all nodes of G when visiting G as follows: The visit is depth-first, nodes are listed only when not yet visited, and edges are included only from nodes first vosited to other nodes first visited. As a consequence, some of the original edges in G will be missing in S. This is natural, as a tree may never have more than 1 ancestor.

SCC Analysis Method

Title: T.b.d..

Pseudo Code of Tarjan's Algorithm

 Strongly Connected Components, Tarjan's Algorithm // a node in Tarjan's notation has associated array structures, indexed // by node ID; these arrays are lowlink() and number(). Also, there // is a stack of nodes. The stack, the arrays, and an integer counter // scc_number are globally visible int scc_number; procedure scc( v ) { // scc lowlink(v) := number(v) := ++scc_number push( v ) for all successors w of v do if w is not visited then -- v->w is a tree arc scc( w ) lowlink(v) := min( lowlink(v), lowlink(w) ) elsif number(w) < number(v) then -- v->w is cross link if in_stack(w) then lowlink(v) := min( lowlink(v), number(w) ) end if end if end for if lowlink(v) = number(v) then -- next scc found while w := top_of_stack_node; number(w) >= number(v) do pop(w) end while end if } // end scc procedure main() { // main scc_number := 0 empty the stack mark all nodes in G as 'not visited' for each node in G that is not visited, do scc( w ) end for } // end main

Data Structures for implementation of Tarjan's Algorithm

 Strongly Connected Components Data Structures typedef struct cfg_node_tp { BOOL visited; /* if seen during cfg/scc traversal */ BOOL is_exit; /* only true for special exit node */ BB_Num_Range cfg_bb_num; /* which BB AM I? BB num is my id */ Quad_Range first_q; /* q[first] is first opcode of BB */ Quad_Range last_q; /* q[last] is last opcode of BB */ cfg_class_enum_tp exit_class; /* classify last instruction of BB */ qop_class_enum_tp first_op; /* opcode of first instr in this BB */ qop_class_enum_tp last_op; /* opcode of last instr in this BB */ cfg_node_ptr_tp finger; /* finger through; start at ada_cfg */ cfg_succ_union_tp succu; /* union of all successors */ unsigned scc_lowlink; /* Tarjan's LOWLINK */ unsigned scc_number; /* Tarjan's NUMBER */ unsigned SCC; /* number of SCC */ cfg_node_ptr_tp scc_pred; /* no need for separate stack */ } cfg_node_struct_tp; #define CFG_NODE_SIZE sizeof( cfg_node_struct_tp ) /********************************************************************= / /********** **********/ /********** S C C T y p e s **********/ /********** **********/ /********************************************************************= / /* Create linked list of pointers to those nodes in cfg that are * SCCs. Field ``scc_head'' points to head. All other components * can be reached from there. * * Total list is pointed to by global ``scc_head,'' initially NIL. */ #ifdef SIMPLE_COMPILER typedef struct scc_node_tp; #endif SIMPLE_COMPILER typedef struct scc_node_tp * scc_node_ptr_tp; typedef struct scc_node_tp { cfg_node_ptr_tp scc_head_in_cfg; scc_node_ptr_tp scc_pred; } scc_node_struct_tp; #define SCC_NODE_SIZE sizeof( scc_node_struct_tp )

C Code of Real Implementation Tarjan=92s idea

 Implementation of Strongly Connected Components /* Global changes, careful: Create a llist of nodes, pointed at * by global ``scc_stack,'' initially NIL. Each time we push(), * field v->scc_pred points to scc_stack and scc_stack is made to * point to the current node v. */ PRIVATE void push( v ) cfg_node_ptr_tp v; { /* push */ v->visited = BTRUE; v->scc_pred = scc_stack; scc_stack = v; } /*end push*/ PRIVATE void pop( ) { /* pop */ scc_stack->visited = BFALSE; /* redundant Herb, eliminate?? */ scc_stack = scc_stack->scc_pred; } /*end pop*/ PRIVATE void scc( ); /* forward announcement */ #define min( a, b ) ( ( a ) < ( b ) ? ( a ) : ( b ) ) PRIVATE void scc_action( v, w ) cfg_node_ptr_tp v; cfg_node_ptr_tp w; { /* scc_action */ # ifdef CFG_DEBUG if ( !v ) { cfg_abort( 1203, "scc_action was called with NIL v" ); } /*end if*/ if ( !w ) { cfg_abort( 1204, "scc_action was called with NIL w" ); } /*end if*/ # endif CFG_DEBUG if ( ! w->scc_number ) { scc( w ); v->scc_lowlink = min( v->scc_lowlink, w->scc_lowlink ); }else if ( w->scc_number < v->scc_number ) { if ( w->visited ) { v->scc_lowlink = min( v->scc_lowlink, w->scc_number ); } /*end if*/ } /*end if*/ } /*end scc_action*/ PRIVATE void scc_semantics( v ) cfg_node_ptr_tp v; { /* scc_semantics */ scc_node_ptr_tp temp = (scc_node_ptr_tp)my_malloc( SCC_NODE_SIZE ); /* first time around remember NIL */ temp->scc_pred = scc_head; temp->scc_head_in_cfg = v; scc_head = temp; } /*end scc_semantics*/   /* Find Strongly Connected Components a-la Tarjan. See the wonderful * paper by Tarjan: * * ``Depth-First Search and Linear Graph Algorithms'' * Robert Tarjan * SIAM J. Comput. Vol. 1, No. 2, June 1972. * * Minor mods: I detect singleton scc and do NOT count. I don't detect * cycle in the singleton node. May be of interest for * other scc detections. * */ PRIVATE void scc( v ) cfg_node_ptr_tp v; { /* scc */ brx_node_ptr_tp w; /* safely assume that v->number == 0 */ # ifdef CFG_DEBUG if ( !v ) { printf( " <><> Impossible, v is NIL\n" ); abort( 1112, "Node v must be != NIL" ); } /*end if*/ if ( v->scc_number ) { printf( " <><>node must be 0. BB(%d)\n", v->cfg_bb_num ); abort( 1200, "Node must be 0" ); } /*end if*/ # endif CFG_DEBUG v->scc_lowlink = v->scc_number = ++scc_number; push( v ); switch ( v->exit_class ) { case cfg_br: scc_action( v, v->succu.br ); break; case cfg_brx: for ( w = v->succu.brx; w; w = w->next ) { scc_action( v, w->brx_cfgx ); } /*end for*/ break; case cfg_cond_br_call: scc_action( v, v->succu.cond_br.cfgx_1 ); scc_action( v, v->succu.cond_br.cfgx_2 ); break; case cfg_fall_through: scc_action( v, v->succu.fall_through ); break; default: /* others have no successors */ ; } /*end switch*/ /* processed all successors in adjacency list of v if any */ if ( v->scc_number == v->scc_lowlink ) { /* v is root of a new SCC */ if ( scc_stack == v ) { /* A singleton scc, just ignore it. * If there is cycle, can be vectorized if needed; * But for now I ain't looking for this. */ pop( ); }else{ cfg_node_ptr_tp head; ++SCC; # ifdef SCC_DEBUG printf( "SCC(%d): ", SCC ); # endif SCC_DEBUG while ( scc_stack->scc_number >= v->scc_number ) { scc_stack->SCC = SCC; head = scc_stack; # ifdef SCC_DEBUG printf( "BB(%d), ", head->cfg_bb_num ); # endif SCC_DEBUG pop( ); } /*end while*/ # ifdef SCC_DEBUG printf( "\n" ); # endif SCC_DEBUG scc_semantics( head ); } /*end if*/ } /*end if*/ } /*end scc*/

References

[1] Robert Tarjan: "Depth-First Search and Linear Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2, June 1972.