Why study the Java Virtual Machine (JVM)?
Understanding (and improving on) the output of Java compilers (just like looking at machine code from C).
Understanding the input to just-in-time compilers and other optimized Java execution engines.
Understanding portability issues.
Understanding security issues.
JVM can be used as a target for other languages.
You might possibly want to implement your own JVM.
Example: Count
Count.java:
Example (cont.) { Compiled from Count.java synchronized class Count extends java.lang.Object /* ACC_SUPER bit set */ public static void main(java.lang.String[]); Count();
Method void main(java.lang.String[]) 0 iconst_0 1 istore_1 2 goto 15 5 getstatic #6 <Field java.io.PrintStream out> 8 iload_1 9 invokevirtual #7 <Method void println(int)> 12 iinc 1 1 15 iload_1 16 bipush 10 18 if_icmplt 5 21 return
Method Count() 0 aload_0 1 invokespecial #5 <Method java.lang.Object()> 4 return
Java Virtual Machine Architecture
A JVM contains the following components:
Program Counter (per thread)
Stack (per thread)
Heap (shared) - contains all objects
Method Area (shared) - byte-codes and constant pools
Native method stacks (per thread, if required)
Method code is a sequence of byte-code instructions that implement methods (and constructors). The JVM byte-code is stack-based; most instructions take their operands from the stack and leave their results there.
Each class has a constant pool, which contains all the constant data referenced by the methods of that class, including numbers, strings, and symbolic names of other classes and members referenced by this class.
Stacks and Frames
There is one stack per thread. A stack consists of a sequence of frames; frames need not be contiguous in memory. Frame size and overall stack size may be limited by implementations.
One frame is associated with each method invocation. Each frame contains two areas, each of statically fixed size (per method):
local variable storage associated with the method, and
an operand stack for evaluating expressions within the method and for communicating arguments and results with other methods.
The local variable area is an array of words, addressed by word offset from the array base. Most locals occupy one word; long and double values occupy two consecutive words. The arguments to a method (including this, for instance methods) always appear as its initial local variables.
The operand stack is a stack of words. Most operands occupy one word; long and double values occupy two consecutive words, which must not be manipulated independently.
Frames may optionally contain additional information, e.g., for debugging.
Types and Verification
The JVM directly supports each of the primitive Java types (except boolean, which is mapped to int). Floating-point arithmetic follows IEEE 754. Values of reference types (classes,interfaces,arrays) are represented as heap pointers; layout of these values is implementation-dependent.
Data values are not tagged with type information, but instructions are. When executing, the JVM assumes that instructions are always operating on values of the correct type. The instruction set is designed to make it possible to verify that any given method is type-correct, without executing it. The JVM performs verification on any bytecode derived from an untrusted source (e.g., over the network).
At any given point of execution, each entry in the local variable area and the operand stack must have a well-defined type state; i.e., it must be possible to deduce the type of each entry unambiguously.
This is an unusual property for stacks! To enforce it, JVM code must be written with care. For example, when there are two execution paths to the same PC, they must arrive with identical type state. So, for example, it is impossible to to use a loop to copy an array onto the stack.
Instruction Set
Each JVM instruction consists of a one-byte op code followed by zero or more parameters. Instructions are only byte-aligned. Multi-byte parameters are stored in big-endian order.
The inner loop of the JVM execution engine (ignoring exceptions) is effectively:
Most instructions take their operands from the top of the stack (popping them in the process) and push their result back on the top of the stack. A few operate directly on local variables.
Most instructions encode the type of their operands; thus, many instructions have multiple versions distinguished by their prefix (i,l,f,d,b,s,c,a).
The instruction set is not totally orthogonal; in particular, few operations are provided for bytes, shorts, and chars, and integer comparisons are much simpler than non-integer ones. In all, 201 out of 255 possible op-code values are used.
Families of instructions
Instructions group into families. Each family does the same basic operation, but has a variety of members distinguished by operand type and built-in arguments.
Example: load pushes the value of a local variable (specified as a parameter) onto the stack. Variants:
Load and Store
load - push local variable onto stack
store - pop top-of-stack into local variable
push,ldc,const - push constant onto stack
wide - modify following load or store to have wider parameter.
Arithmetic and Logic
add,sub,mul, div, rem, neg
shl,shr, ushr
or, and, xor
iinc - increment local variable
div and rem will throw an ArithmeticException given a zero divisor.
Conversions
i2l,i2f,i2d,l2f,l2d,f2d.
i2b,i2c,i2s, etc. - never raise exception.
Objects
new - create new class instance
newarray - creates new array
getfield,putfield - access instance variables
getstatic,putstatic - access class variables
aload, astore - push, pop array elements to,from stack
arraylength
instanceof, checkcast - runtime narrowing checks
Stack management
pop,dup,dup_x,swap
Control transfer
if_icmpeq,if_icmplt, etc. - compare ints and branch
ifeq,iflt, etc. - compare int with zero and branch
if_acmpeq, if_acmpne - compare refs and branch
ifnull,ifnonnull - compare ref with null and branch
cmp - compare (non-integer) values and push result code (-1,0,1)
tableswitch,lookupswitch - for switch statements
goto - target is offset in method code
jsr,ret - intended for finally
athrow - throw explicit exception
Method invocation
invokevirtual - for ordinary instance methods
invokeinterface - for interface methods
invokespecial - for constructor (<init>),private, or
superclass methods
invokestatic - for static methods
return
Constant Pool
The constant pool contains the following kinds of entries:
Utf8 - Unicode string in UTF-8 format.
Integer,Float,Long,Double
String - String, represented by Utf8
Class - Fully-qualified Java class name, represented by Utf8
NameAndType - Simple field or method name plus field or method descriptor, each represented by Utf8.
Fieldref, Methodref, InterfaceMethodref
- Class plus NameAndType.
Descriptors are strings that encode type information for fields or methods in terms of base types and fully-qualified class names. Method descriptors include the types of method parameters and result.
Resolution
Entries in the constant pool are resolved when first referenced by an executing instruction. Resolution of ref constants is a very complex process involving loading, linking (verifying and preparing), and initializing the Class of the ref and any classes on which it depends.
The end result of the resolution process is a direct pointer to the runtime representation of an object and/or an offset into that representation. This information is used to execute the provoking instruction, and may be used to rewrite that instruction into a more efficient form (e.g., Suns' quick form instructions).
Once a constant pool entry has been resolved, subsequent references to it always use the results of the original resolution.
Exception table
In addition to a sequence of byte-code instructions, each method has an exception table describing the all the exception handlers defined for the method.
Each entry in the table describes one handler body, and gives:
The starting and ending PCs for which the handler applies.
The PC of the handler code.
The subclass of Throwable caught by this exception.
PC ranges are always either nested or discrete, never overlapping.
When an exception is raised, the table is searched in order; the first entry whose PC range covers the raise PC and whose class matches the thrown value is invoked.
Code for handlers is normally just appended to the main-line method code.
Java Class File Format
The class file format is the real standard of binary interoperability for JVM programs. Each class file describes a single class or interface. It is a stream of bytes, which may be obtained from a file, over a network, or elsewhere.
The class file contains:
Magic number and compiler version information.
Constant pool.
Access flags for this class.
Name of this class, its super-class, and its direct superinterfaces.
Number, names, access flags, type descriptors, and values (if constant) for its fields.
Number, names, access flags, type descriptors, code, and exception tables for its methods.
Additional attribute information (e.g., for debugging) may be attached at the class, field, or method level.
Example: Manipulating Doubles
Example: Trivial Arithmetic
Example: More Arithmetic
Example: Operand Stack Manipulation (1)
Example: Operand Stack Manipulation (2)
Example: Arrays
Example: Objects
Example: Objects (2)
Example: Objects(3)
Example: Objects(4)