COMPILING TO NATIVE CODE ======================== Java source code: public static void foo(int a[], int b[], int i) { a[i] = (2 * a[i] + b[i]); } JVM code: Method void foo(int[], int[], int) 0 aload_0 ; a 1 iload_2 ; i 2 iconst_2 3 aload_0 ; a 4 iload_2 ; i 5 iaload ; a[i] 6 imul ; a[i] * 2 7 aload_1 ; b 8 iload_2 ; i 9 iaload ; b[i] 10 iadd ; a[i] * 2 + b[i]; 11 iastore ; a[i] 12 return Code generation. Assume RISC-like load/store architecture, with registers r1,r2,r3,...,fp,sp. Instructions: move r,r add r,r/c,r sll c,r,r cmp r,r/c br{eq/neq/leu/...} lab load c(r),r store r,c(r) Assume constants have been resolved prior to code generation. STACK LAYOUT ============ Stack layout for procedure with m locals, of which n are args; and max stack depth s. ---------------------- | local 0 = arg 0 | ---------------------- | ... | ---------------------- | local n-1 = arg n-1| ---------------------- | local n | ---------------------- | ... | ---------------------- | local m-1 | ---------------------- | return addr | ---------------------- fp -> | saved fp | ---------------------- | operand stack 0 | ---------------------- | operand stack 1 | ---------------------- | ... | ---------------------- sp -> | operand stack p | (p < s) ---------------------- STACK LAYOUT EXAMPLE ==================== For routine foo: ---------------------- fp+16 | arg 0 (a) | ---------------------- fp+12 | arg 1 (b) | ---------------------- fp+8 | arg 2 (i) | ---------------------- fp+4 | return addr | ---------------------- fp ---> | saved fp | <--- ---------------------- fp-4 | operand stack 0 | ---------------------- fp-8 | operand stack 1 | possible ---------------------- sp fp-12 | operand stack 2 | values ---------------------- fp-16 | operand stack 3 | ---------------------- fp-20 | operand stack 4 | <--- ---------------------- Integer Arrays: ---------------------- a ---> | class ptr -------|-------> Object ---------------------- a+4 | length | ---------------------- a+8 | a[0] | ---------------------- a+12 | a[1] | ---------------------- | ... | ---------------------- a+8+i*4 | a[i] | ---------------------- | ... | ---------------------- | a[length-1] | ---------------------- NATIVE CODE VERSION 0 ====================== Very naive; just lay down code equivalent to what's executed by interpreter. foo: store fp,(sp) ; save old frame pointer move sp,fp ; new frame pointer ;aload_0 load 16(fp),r0 ; fetch a add sp,#-4,sp ; push store r0,(sp) ; a ;iload_2 load 8(fp),r0 ; fetch i add sp,#-4,sp ; push store r0,(sp) ; i ;iconst_2 move #2,r0 add sp,#-4,sp ; push store r0,(sp) ; 2 ;aload_0 load 16(fp),r0 ; fetch a add sp,#-4,sp ; push store r0,(sp) ; a ;iload_2 load 8(fp),r0 ; fetch i add sp,#-4,sp ; push store r0,(sp) ; i ;iaload load 4(sp),r0 ; fetch array ptr cmp r0,#0 ; null check (can often use MMU instead) bneq L1 ; branch if ok ...raise exception... L1: load 4(r0),r1 ; array length load (sp),r2 ; index cmp r2,r1 ; bounds check brltu L2 ; branch if ok (note standard trick) ... raise exception ... L2: sll #2,r2,r2 ; calculate byte offset add r0,r2,r2 ; add array base load 8(r2),r2 ; fetch a[i] add sp,#4,sp ; pop one store r2,(sp) ; push a[i] (replace) ;imul load (sp),r0 ; fetch a[i] load 4(sp),r1 ; 2 mul r0,r1,r1 ; a[i] * 2 add sp,#4,sp ; pop one store r1,-12(sp) ; push a[i]*2 (replace) ;aload_1 load 12(fp),r0 ; fetch b add sp,#-4,sp ; push store r0,(sp) ; b ;iload_2 load 8(fp),r0 ; fetch i add sp,#-4,sp ; push store r0,(sp) ; i ;iaload load 4(sp),r0 ; fetch array ptr cmp r0,#0 ; null check (can often use MMU instead) bneq L3 ; branch if ok ...raise exception... L3: load 4(r0),r1 ; array length load (sp),r2 ; index cmp r2,r1 ; bounds check brltu L4 ; branch if ok ... raise exception ... L4: sll #2,r2,r2 ; calculate byte offset add r0,r2,r2 ; add array base load 8(r2),r2 ; fetch b[i] add sp,#4,sp ; pop one store r2,(sp) ; push b[i] (replace) ; iadd load 4(sp),r0 ; fetch 2*a[i] load (sp),r1 ; fetch b[i] add r0,r1,r1 ; 2*a[i] + b[i] add sp,#4,sp ; pop one store r1,(sp) ; push 2*a[i] + b[i] (replace) ; iastore load 8(sp),r0 ; fetch array ptr cmp r0,#0 ; null check (can often use MMU instead) bneq L5 ; branch if ok ...raise exception... L5: load 4(r0),r1 ; array length load 4(sp),r2 ; index cmp r2,r1 ; bounds check brltu L6 ; branch if ok ... raise exception ... L6: sll #2,r2,r2 ; calculate byte offset add r0,r2,r2 ; add array base load (sp),r0 ; fetch value store r0,8(r2) ; store a[i] add sp,#12,sp ; pop three ; return load (sp),fp ; restore frame pointer ret 71 instructions; 23 loads; 13 stores; 3 registers. NATIVE CODE VERSION 1 ===================== Since we always know current stack depth, can treat operand stack as an array, saving the cost of adjusting sp each time we push or pop. foo: store fp,(sp) ; save old frame pointer move sp,fp ; new frame pointer add sp,#-20,sp ; make space for operand stack ;aload_0 load 16(fp),r0 ; fetch a store r0,-4(fp) ; push a ;iload_2 load 8(fp),r0 ; fetch i store r0,-8(fp) ; push i ;iconst_2 move #2,r0 store r0,-12(fp) ; push 2 ;aload_0 load 16(fp),r0 ; fetch a store r0,-16(fp) ; push a ;iload_2 load 8(fp),r0 ; fetch i store r0,-20(fp) ; push i ;iaload load -16(fp),r0 ; fetch array ptr cmp r0,#0 ; null check (can often use MMU instead) bneq L1 ; branch if ok ...raise exception... L1: load 4(r0),r1 ; array length load -20(fp),r2 ; index cmp r2,r1 ; bounds check brltu L2 ; branch if ok ... raise exception ... L2: sll #2,r2,r2 ; calculate byte offset add r0,r2,r2 ; add array base load 8(r2),r2 ; fetch a[i] store r2,-16(fp) ; push ;imul load -16(fp),r0 ; fetch a[i] load -12(fp),r1 ; 2 mul r0,r1,r1 ; a[i] * 2 store r1,-12(fp) ; push a[i]*2 ;aload_1 load 12(fp),r0 ; fetch b store r0,-16(fp) ; push b ;iload_2 load 8(fp),r0 ; fetch i store r0,-20(fp) ; push i ;iaload load -16(fp),r0 ; fetch array ptr cmp r0,#0 ; null check (can often use MMU instead) bneq L3 ; branch if ok ...raise exception... L3: load 4(r0),r1 ; array length load -20(fp),r2 ; index cmp r2,r1 ; bounds check brltu L4 ; branch if ok ... raise exception ... L4: sll #2,r2,r2 ; calculate byte offset add r0,r2,r2 ; add array base load 8(r2),r2 ; fetch b[i] store r2,-16(fp) ; push ; iadd load -12(fp),r0 ; fetch 2*a[i] load -16(fp),r1 ; fetch b[i] add r0,r1,r1 ; 2*a[i] + b[i] store r1,-12(fp) ; push ; iastore load -4(fp),r0 ; fetch array ptr cmp r0,#0 ; null check (can often use MMU instead) bneq L5 ; branch if ok ...raise exception... L5: load 4(r0),r1 ; array length load -8(fp),r2 ; index cmp r2,r1 ; bounds check brltu L6 ; branch if ok ... raise exception ... L6: sll #2,r2,r2 ; calculate byte offset add r0,r2,r2 ; add array base load -12(fp),r0 ; fetch value store r0,8(r2) ; store a[i] ; return move fp,sp ; pop operand stack frame load (sp),fp ; restore frame pointer ret 61 instructions; 23 loads; 13 stores; 3 registers. NATIVE CODE VERSION 2 ===================== Cache stack and local variables in registers, spilling (not shown here) if necessary. Note: This approach may not buy too much on a machine with few registers, because of greatly increased spilling. Register assignments: Stack registers: r0,...,r4 Local variable registers: r10,r11,r12 Scratch registers: r20 In reality, arguments are likely to be in registers already, so those local variables can just be assumed to be in place. foo: store fp,(sp) ; save old frame pointer move sp,fp ; new frame pointer ;aload_0 load 16(fp),r10 ; fetch a (first ref) move r10,r0 ; push a ;iload_2 load 8(fp),r12 ; fetch i (first ref) move r12,r1 ; push i ;iconst_2 move #2,r2 ; push 2 ;aload_0 move r10,r3 ; push a ;iload_2 move r12,r4 ; push i ;iaload cmp r3,#0 ; null check (can often use MMU instead) bneq L1 ; branch if ok ...raise exception... L1: load 4(r3),r20 ; array length cmp r4,r20 ; bounds check brltu L2 ; branch if ok ... raise exception ... L2: sll #2,r4,r4 ; calculate byte offset add r3,r4,r4 ; add array base load 8(r4),r3 ; fetch and push a[i] ;imul mul r2,r3,r2 ; push a[i]*2 ;aload_1 load 12(fp),r11 ; fetch b (first ref) move r11,r3 ; push b ;iload_2 move r12,r4 ; push i ;iaload cmp r3,#0 ; null check (can often use MMU instead) bneq L3 ; branch if ok ...raise exception... L3: load 4(r3),r20 ; array length cmp r4,r20 ; bounds check brltu L4 ; branch if ok ... raise exception ... L4: sll #2,r4,r4 ; calculate byte offset add r3,r4,r4 ; add array base load 8(r4),r3 ; fetch and push b[i] ; iadd add r2,r3,r2 ; push a[i]*2 + b[i] ; iastore cmp r0,#0 ; null check (can often use MMU instead) bneq L5 ; branch if ok ...raise exception... L5: load 4(r0),r20 ; array length cmp r1,r20 ; bounds check brltu L6 ; branch if ok ... raise exception ... L6: sll #2,r1,r1 ; calculate byte offset add r0,r1,r1 ; add array base store r2,8(r1) ; store a[i] ; return load (sp),fp ; restore frame pointer ret 41 instructions; 9 loads; 2 stores; 9 registers. NATIVE CODE VERSION 3 ===================== Use more elaborate descriptors for stack slots - register - constant to achieve constant propagation and avoid register-register moves. Stack registers: r0,... (as needed) Local variable registers: r10,r11,r12 Scratch registers: r20 foo: store fp,(sp) ; save old frame pointer move sp,fp ; new frame pointer ;aload_0 load 16(fp),r10 ; fetch a (first ref) stack[0] = r10 ;iload_2 load 8(fp),r12 ; fetch i (first ref) stack[1] = r12 ;iconst_2 ; stack[2] = 2 ;aload_0 ; stack[3] = r10 ;iload_2 ; stack[4] = r12 ;iaload cmp r10,#0 ; null check (can often use MMU instead) bneq L1 ; branch if ok ...raise exception... L1: load 4(r10),r20 ; array length cmp r12,r20 ; bounds check brltu L2 ; branch if ok ... raise exception ... L2: sll #2,r12,r20 ; calculate byte offset add r10,r20,r20 ; add array base load 8(r20),r0 ; fetch a[i] stack[3] = r0 (tos = 3) ;imul mul r0,2,r0 ; a[i]*2 NOTE SWAP! stack[2] = r0 (tos = 2) ;aload_1 load 12(fp),r11 ; fetch b (first ref) stack[3] = r11 ;iload_2 ; stack[4] = r12 ;iaload cmp r11,#0 ; null check (can often use MMU instead) bneq L3 ; branch if ok ...raise exception... L3: load 4(r11),r20 ; array length cmp r12,r20 ; bounds check brltu L4 ; branch if ok ... raise exception ... L4: sll #2,r12,r20 ; calculate byte offset add r11,r20,r20 ; add array base load 8(r20),r1 ; fetch b[i] stack[3] = r1 (tos = 3) ; iadd add r0,r1,r0 ; a[i]*2 + b[i] stack[2] = r0 (tos = 2) ; iastore cmp r10,#0 ; null check (can often use MMU instead) bneq L5 ; branch if ok ...raise exception... L5: load 4(r10),r20 ; array length cmp r12,r20 ; bounds check brltu L6 ; branch if ok ... raise exception ... L6: sll #2,r12,r20 ; calculate byte offset add r10,r20,r20 ; add array base store r0,8(r20) ; store a[i] ; return load (sp),fp ; restore frame pointer ret 33 instructions; 9 loads; 2 stores; 6 registers.