I. Generic Compiler Architecture The compilation pipeline: - Source code | V o Lexical Analysis & Parsing | V - Abstract Syntax Tree (AST) | V o Type-checking and other Static Correctness Analysis | V - (Revised) AST (for legal program) | V o Intermediate Code Generation | V - Intermediate Representation (IR) --> o Interpreter | V o Machine-independent Optimization | V - Revised IR | V o Target Code Generation | V - Machine Code | V o Machine-dependent optimization | V - Improved Machine Code | V o Code Emission | V - Binaries (files or core images) | V o Linker/Loader | V - Core image that can be executed (Combined with runtime library: interface to O/S, memory management, thread support, etc.) II. Issues in traditional compilation pipeline: - Need to support separate compilation of modular programs (interaction at: typechecking time, linking of binaries). - Need to support interoperability with other languages, at least at procedure call level. - What is the granularity of compilation (one procedure at a time, or more?) - What is the granularity of optimization (peephole, intra-procedural, inter-procedural?) (Note tradeoff between optimization quality and compilation speed.) - IR can be pitched ``high'' or ``low'', or can use several IRs, suited for different optimization purposes - Target architecture can be different from that of compiler (host). - Compiler may be designed to allow easy portability to multiple target machines and/or from multiple source languages. III. The Java architecture: - Source code (.java file) | V o javac: Lexical Analysis & Parsing + Type-checking | V - IR = Byte code (.class file) | V o JVM: Verification (essentially repeating static checks) + (Interpretation OR Compilation + Loading + Executing) IV. Issues in Java architecture: - Mandated separation of front end and back end with published intermediate code. - Back end doesn't trust provider of byte codes; hence verification step in JVM. - Focus on high-speed compilation: JIT compilers; mixed interp/compiler (eg HotSpot); feedback-directed optimization. - Focus on resource-bounded execution environment - Dynamic loading (and even reloading) of class defns. - (Except for the need to support dynamic loading, we could dispense with byte code and JVM, and use standard compiler architecture for Java too; some experimental systems do. - Byte code is relatively high-level for IR (can recover source from it), and is better suited to being interpreted than to being optimized, so compiler in JVM often uses lower-level IR. - In this course, we can dispense with front-end, and just treat byte-code as source. - Microsoft's version of picture explicitly makes byte code (CLI) a multi-language common ground.