Why study the Java Virtual Machine (JVM)?

tex2html_wrap_inline234 Understanding (and improving on) the output of Java compilers (just like looking at machine code from C).

tex2html_wrap_inline234 Understanding the input to just-in-time compilers and other optimized Java execution engines.

tex2html_wrap_inline234 Understanding portability issues.

tex2html_wrap_inline234 Understanding security issues.

tex2html_wrap_inline234 JVM can be used as a target for other languages.

tex2html_wrap_inline234 You might possibly want to implement your own JVM.

Example: Count

Count.java:

code41

Example (cont.) { Compiled from Count.java synchronized class Count extends java.lang.Object /* ACC_SUPER bit set */ public static void main(java.lang.String[]); Count();

Method void main(java.lang.String[]) 0 iconst_0 1 istore_1 2 goto 15 5 getstatic #6 <Field java.io.PrintStream out> 8 iload_1 9 invokevirtual #7 <Method void println(int)> 12 iinc 1 1 15 iload_1 16 bipush 10 18 if_icmplt 5 21 return

Method Count() 0 aload_0 1 invokespecial #5 <Method java.lang.Object()> 4 return

Java Virtual Machine Architecture

A JVM contains the following components:

Program Counter (per thread)

Stack (per thread)

Heap (shared) - contains all objects

Method Area (shared) - byte-codes and constant pools

Native method stacks (per thread, if required)

Method code is a sequence of byte-code instructions that implement methods (and constructors). The JVM byte-code is stack-based; most instructions take their operands from the stack and leave their results there.

Each class has a constant pool, which contains all the constant data referenced by the methods of that class, including numbers, strings, and symbolic names of other classes and members referenced by this class.

Stacks and Frames

There is one stack per thread. A stack consists of a sequence of frames; frames need not be contiguous in memory. Frame size and overall stack size may be limited by implementations.

One frame is associated with each method invocation. Each frame contains two areas, each of statically fixed size (per method):

tex2html_wrap_inline234 local variable storage associated with the method, and

tex2html_wrap_inline234 an operand stack for evaluating expressions within the method and for communicating arguments and results with other methods.

The local variable area is an array of words, addressed by word offset from the array base. Most locals occupy one word; long and double values occupy two consecutive words. The arguments to a method (including this, for instance methods) always appear as its initial local variables.

The operand stack is a stack of words. Most operands occupy one word; long and double values occupy two consecutive words, which must not be manipulated independently.

Frames may optionally contain additional information, e.g., for debugging.

Types and Verification

The JVM directly supports each of the primitive Java types (except boolean, which is mapped to int). Floating-point arithmetic follows IEEE 754. Values of reference types (classes,interfaces,arrays) are represented as heap pointers; layout of these values is implementation-dependent.

Data values are not tagged with type information, but instructions are. When executing, the JVM assumes that instructions are always operating on values of the correct type. The instruction set is designed to make it possible to verify that any given method is type-correct, without executing it. The JVM performs verification on any bytecode derived from an untrusted source (e.g., over the network).

At any given point of execution, each entry in the local variable area and the operand stack must have a well-defined type state; i.e., it must be possible to deduce the type of each entry unambiguously.

This is an unusual property for stacks! To enforce it, JVM code must be written with care. For example, when there are two execution paths to the same PC, they must arrive with identical type state. So, for example, it is impossible to to use a loop to copy an array onto the stack.

Instruction Set

Each JVM instruction consists of a one-byte op code followed by zero or more parameters. Instructions are only byte-aligned. Multi-byte parameters are stored in big-endian order.

The inner loop of the JVM execution engine (ignoring exceptions) is effectively:

code68

Most instructions take their operands from the top of the stack (popping them in the process) and push their result back on the top of the stack. A few operate directly on local variables.

Most instructions encode the type of their operands; thus, many instructions have multiple versions distinguished by their prefix (i,l,f,d,b,s,c,a).

The instruction set is not totally orthogonal; in particular, few operations are provided for bytes, shorts, and chars, and integer comparisons are much simpler than non-integer ones. In all, 201 out of 255 possible op-code values are used.

Families of instructions

Instructions group into families. Each family does the same basic operation, but has a variety of members distinguished by operand type and built-in arguments.

Example: load pushes the value of a local variable (specified as a parameter) onto the stack. Variants:

code73

Load and Store

tex2html_wrap_inline234 load - push local variable onto stack
tex2html_wrap_inline234 store - pop top-of-stack into local variable
tex2html_wrap_inline234 push,ldc,const - push constant onto stack
tex2html_wrap_inline234 wide - modify following load or store to have wider parameter.

Arithmetic and Logic

tex2html_wrap_inline234 add,sub,mul, div, rem, neg
tex2html_wrap_inline234 shl,shr, ushr
tex2html_wrap_inline234 or, and, xor
tex2html_wrap_inline234 iinc - increment local variable

div and rem will throw an ArithmeticException given a zero divisor.

Conversions

tex2html_wrap_inline234 i2l,i2f,i2d,l2f,l2d,f2d.
tex2html_wrap_inline234 i2b,i2c,i2s, etc. - never raise exception.

Objects

tex2html_wrap_inline234 new - create new class instance
tex2html_wrap_inline234 newarray - creates new array
tex2html_wrap_inline234 getfield,putfield - access instance variables
tex2html_wrap_inline234 getstatic,putstatic - access class variables
tex2html_wrap_inline234 aload, astore - push, pop array elements to,from stack
tex2html_wrap_inline234 arraylength
tex2html_wrap_inline234 instanceof, checkcast - runtime narrowing checks

Stack management

tex2html_wrap_inline234 pop,dup,dup_x,swap

Control transfer

tex2html_wrap_inline234 if_icmpeq,if_icmplt, etc. - compare ints and branch
tex2html_wrap_inline234 ifeq,iflt, etc. - compare int with zero and branch
tex2html_wrap_inline234 if_acmpeq, if_acmpne - compare refs and branch
tex2html_wrap_inline234 ifnull,ifnonnull - compare ref with null and branch
tex2html_wrap_inline234 cmp - compare (non-integer) values and push result code (-1,0,1)
tex2html_wrap_inline234 tableswitch,lookupswitch - for switch statements
tex2html_wrap_inline234 goto - target is offset in method code
tex2html_wrap_inline234 jsr,ret - intended for finally
tex2html_wrap_inline234 athrow - throw explicit exception

Method invocation

tex2html_wrap_inline234 invokevirtual - for ordinary instance methods
tex2html_wrap_inline234 invokeinterface - for interface methods
tex2html_wrap_inline234 invokespecial - for constructor (<init>),private, or superclass methods
tex2html_wrap_inline234 invokestatic - for static methods
tex2html_wrap_inline234 return

Constant Pool

The constant pool contains the following kinds of entries:

tex2html_wrap_inline234 Utf8 - Unicode string in UTF-8 format.

tex2html_wrap_inline234 Integer,Float,Long,Double

tex2html_wrap_inline234 String - String, represented by Utf8

tex2html_wrap_inline234 Class - Fully-qualified Java class name, represented by Utf8

tex2html_wrap_inline234 NameAndType - Simple field or method name plus field or method descriptor, each represented by Utf8.

tex2html_wrap_inline234 Fieldref, Methodref, InterfaceMethodref
- Class plus NameAndType.

Descriptors are strings that encode type information for fields or methods in terms of base types and fully-qualified class names. Method descriptors include the types of method parameters and result.

Resolution

Entries in the constant pool are resolved when first referenced by an executing instruction. Resolution of ref constants is a very complex process involving loading, linking (verifying and preparing), and initializing the Class of the ref and any classes on which it depends.

The end result of the resolution process is a direct pointer to the runtime representation of an object and/or an offset into that representation. This information is used to execute the provoking instruction, and may be used to rewrite that instruction into a more efficient form (e.g., Suns' quick form instructions).

Once a constant pool entry has been resolved, subsequent references to it always use the results of the original resolution.

Exception table

In addition to a sequence of byte-code instructions, each method has an exception table describing the all the exception handlers defined for the method.

Each entry in the table describes one handler body, and gives:

tex2html_wrap_inline234 The starting and ending PCs for which the handler applies.

tex2html_wrap_inline234 The PC of the handler code.

tex2html_wrap_inline234 The subclass of Throwable caught by this exception.

PC ranges are always either nested or discrete, never overlapping.

When an exception is raised, the table is searched in order; the first entry whose PC range covers the raise PC and whose class matches the thrown value is invoked.

Code for handlers is normally just appended to the main-line method code.

Java Class File Format

The class file format is the real standard of binary interoperability for JVM programs. Each class file describes a single class or interface. It is a stream of bytes, which may be obtained from a file, over a network, or elsewhere.

The class file contains:

tex2html_wrap_inline234 Magic number and compiler version information.

tex2html_wrap_inline234 Constant pool.

tex2html_wrap_inline234 Access flags for this class.

tex2html_wrap_inline234 Name of this class, its super-class, and its direct superinterfaces.

tex2html_wrap_inline234 Number, names, access flags, type descriptors, and values (if constant) for its fields.

tex2html_wrap_inline234 Number, names, access flags, type descriptors, code, and exception tables for its methods.

tex2html_wrap_inline234 Additional attribute information (e.g., for debugging) may be attached at the class, field, or method level.

Example: Manipulating Doubles

code193

Example: Trivial Arithmetic

code196

Example: More Arithmetic

code199

Example: Operand Stack Manipulation (1)

code202

Example: Operand Stack Manipulation (2)

code205

Example: Arrays

code208

Example: Objects

code211

Example: Objects (2)

code214

Example: Objects(3)

code217

Example: Objects(4)

code220



Andrew P. Tolmach
Thu Jan 15 10:27:27 PST 1998