LECTURE 7 -- Object-oriented language implementation Objects as Records ------------------ Basic representation idea: an object is a record of fields. class A { int x; double y; String z; } A a = new A(); a.x = 10; a.y = 3.14; a.z = "Hello"; a: ----------- x | 10 | ----------- y | 3.14 | | | ----------- z | ------|---> "Hello" ----------- Implementation can fix canonical order of fields based on class definition. To look up a field, can search the class definition: --------------------- /* Compute field offset in object. */ int object_field_offset(Class *class,char *name,char *desc) { int offset = 0; Field *f; for (f = class->fields; f < class->fields + class->fields_count; f++) if ((f->access_flags & ACC_STATIC) == 0) { char *f_desc = get_utf8(class,f->descriptor_index); if (STR_EQ(name,get_utf8(class,f->name_index)) && STR_EQ(desc,f_desc)) /* type comparison is probably superfluous */ return offset; else offset += size_descriptor(f_desc); } die("field %s not found in class *s",name,get_class(class,class->this_class)); } -------------------------- In fact, this is the best we can do in the absence of strong static types (or even of classes). But with static types, the field's offset is statically fixed. In practice, real interpreters will rewrite bytecode to store the offset (in place of the name) of field fetches and stores. Simple Inheritance ------------------ Consider these classes: class A { int a; } class B extends A { int b; } class C extends B { int c; } class D extends A { int d; } A: ----------- a | | ----------- B: ----------- a | | ----------- b | | ----------- C: ----------- a | | ----------- b | | ----------- c | | ----------- D: ----------- a | | ----------- d | | ----------- Again, can find fields by doing naive lookup in Class descriptions, assuming we have pointers from each class to its parent. But can also use *fixed* offsets again, by making subclasses *append* their fields to superclass. Again, in practice will rewrite byte code to access fields via these fixed offsets. Key invariant: must always be able to substitute a subclass object for a superclass object. Object Construction ------------------- - in Java, turns into combination of NEW operator plus - call to a special kind of constructor method (with statically fixed address) which takes 'this' as argument - separating NEW from constructor makes it easier to invokes chains of constructors. A few words about object allocation ----------------------------------- Heap allocation is expensive (directly and indirectly). Sometimes semantically safe to allocate on stack instead -- requires *escape analysis*. Objects that are provably never stored or returned can be heap-allocated. Can also sometimes *inline* objects, if they are never referenced independently. -------------------------------------------------- Object Methods -------------- class X { int x = 10; int f(int a) { return x + a; } } Basic encoding of f is a function int f'(X this,int a) so call x.f(42) turns into f'(x,42) In certain highly limited circumstances (e.g. if X has no subclass that overrides the definition of f), the address of f' is a compile-time constant. Note that *overloading* of method names in Java (and C++) is disambiguated by considering the entire *signature* of the method (including argument and return types) as part of the name. Dynamic Method Dispatch ----------------------- class X { int x = 10; int f(int a) { return x + a; } int g() { return x; } } class Y extends X { int y = 20; int f(int a) { return x + y - a; } } X x1 = new X(); X x2 = new Y(); X x3 = b ? x1 : x2; x3.f(1) + x3.g(); Need to select code for x3.f based on actual class of x3 at runtime. Basic solution: tag record with its class, and dispatch on that. x1: ----------- tag | X | ----------- x | 10 | ----------- x2: ----------- tag | Y | ----------- x | 10 | ----------- y | 20 | ----------- Class Info records in more detail ---------------------------------- - Can look up tag info and then follow parent pointers naively (as with fields) - Better: use per-class dispatch tables (often called ``vtables'') - again, can use fixed offsets X: ----------- f | -----|-------------> code for X.f ----------- g | -----|----- ----------- | ---------> code for X.g | Y: | ----------- | f | -----|-------------> code for Y.f ----------- | g | -----|----- ----------- - Small space waste due to duplicated entries, but not a problem in practice - Important thing is that tables are dense! Multiple Inheritance -------------------- Multiple inheritance doesn't admit such a simple offset scheme: class A { int a; } class B { int b; } class C extends A,B { // not real Java int c; } Here C records need both a and b fields, but both must occupy the same slot! Usual solution in C++ is rather complicated: it involves adjusting 'this' pointers when treating a subtype as a supertype, and hence requires pointers into the middle of records (not good for GC). See Stroustrup, "Multiple Inheritance for C++", http://www-plan.cs.colorado.edu/diwan/class-papers/mi.pdf Interfaces ---------- Java doesn't support multiple inheritance, so avoids this problem with fields, but very similar issues arise with method tables for *interfaces*. interface I { void f(); void g(); } class P implements I { void f() { ... } void g() { ... } void h() { ... } } class Q implement I { void e() { ... } void f() { ... } void g() { ... } } What offset should we assign to f? (Note that there may be other constraints on ordering of methods in P and Q due to class inheritance.) \heading{Some solutions} 1. Assign unique slot #'s to each method name (signature, really) in the program. Require all vtables to be laid out this way. For example: P: ----------- e | 0 | ----------- f | -----|------------> code for P.f ----------- g | -----|------------> code for P.g ----------- h | -----|------------> code for P.h ----------- Q: ----------- e | -----|------------> code for Q.e ----------- f | -----|------------> code for Q.f ----------- g | -----|------------> code for Q.g ----------- h | 0 | ----------- Note that we can truncate these records after the last method this class implements, since subsequent slots can never be referenced. Indeed for Java we must, because we don't know all classes and methods statically. 2. Obvious problem: the vtables can get (very) sparse! 3. Mitigations: 3A. Reduce sparseness by making smart choices of slot #'s. Two methods never implemented by the same class can share the same slot #. Called ``selector coloring'' (similar to register allocation coloring). Can be quite effective. Problem: Doing good job of this requires knowing all method names ahead of time, which we don't in Java. 3B. Observe that the total number of *interfaces* is much less than the total number of *methods*, and use *indirection*. Assign unique #'s to each *interface* in the system. For each class, include a sparse interface dispatch table adjacent to the vtable. For each interface implemented by the class, the corresponding entry points to a dense table of methods. For example: interface I { void f(): void g(); } interface J { void g(); void h(); void k(); } class P implements I,J { void f() { ... } void g() { ... } void h() { ... } void j() { ... } void k() { ... } } For example, suppose P implements I,J where I is interface #2 and J is interface #4. P as J ----------- ------------- J | -----|------------------------------------------> g | ------|--------- ----------- ------------- | | 0 | P as I h | ------|------ | ----------- ------------ ------------- | | I | -----|-----------> f | -----|----------- k | ------|--- | | ----------- ------------ | ------------- | | | | 0 | g | -----|------- | | | | ----------- ------------ | | | | | | 0 | | | | | | P: ----------- | | | | | f | -----|-------------------------------------------------------------------------> code for P.f ----------- | | | | g | -----|-------------------------------------------------------------------------> code for P.g ----------- | | h | -----|-------------------------------------------------------------------------> code for P.h ----------- | j | -----|-------------------------------------------------------------------------> code for P.j ----------- | k | -----|-------------------------------------------------------------------------> code for P.k ----------- Promising, but costs extra indirection, and still have very sparse IDT's. 3C. HASH (imperfectly) the interface method signatures implemented by a class into a compact table. When there is no hash conflict, dispatch directly to the method code. If there is a conflict, dispatch to a dynamically-generated conflict resolution stub that encodes a (binary) search for the correct method. Both B and C work quite well in practice. Lowering Costs of Virtual Method Invocation ------------------------------------------- Function calls are bad, particularly to small functions. Morevoer, virtual (indirect) dispatch is very bad match for modern hardware, because unknown jumps are very bad. Would like to turn into a direct function call (``devirtualize'') or even inline. But obviously cannot do so in general. But sometimes we can. Idea: most jumps aren't really totally unknown -- particularly if: - we can analyse program class structure ; and/or - we can record dynamic profile information about call sites. CLASS HIERARCHY ANALYSIS INLINE CACHING takes advantage of this by changing indirect call into direct call to most likely function. Code at top of function checks to make sure that it is really the intended recipient; if not, it does an ordinary dispatch. POLYMORPHIC INLINE CACHING does the same things when a (small) SET of possibilities are likely.