Safety and Types Define safety policy for executable code, e.g. - Base values (int, float, pointer, code address) are used appropriately (abstraction boundaries aren't violated) - Memory safety: can only r/w in appropriate memory regions (e.g., inside arrays) - APIs used appropriately (e.g., files are never closed twice) - Code meets arbitrary specification (arbitrarily hard to check, of course!) What does it mean to conform to a safety policy? Formally, can give a dynamic semantics (usually operational), e.g., set of transition rules, for language, which defines safe and unsafe behavior. Unsafe programs enter a "BAD" state, or come to a state where there is no legal transition (get "stuck"). Otherwise program is OK. Executing unsafe program must lead to immediate halt in execution with an informative error message. Typing and Safety Classical idea: Define a static semantics (type system), e.g., a set of typing rules. Programs that pass the typing rules are said to be well-typed. Goal: define systems so that well-typed programs don't go wrong at runtime. WT(P) => OK(P). Advantages: - Get check on correctness without actually running the program ("compile time" instead of "run time") - Can omit runtime checks and so speed up execution. - Static check covers all possible execution paths. Of course, would like static rules to be decideable (or at least semi-decideable) In practice, divide properties into statically- and dynamically-checkable subsets. Note: Static checks constrain programming style, since some (dynamically) safe programs are bound to be rejected by (conservative) static checkers. But what about compiled/translated code? E.g., compiler C translates P to P'. Must now phrase OK property in terms of dynamic semantics of P' and show that OK(P) <=> OK'(P'). Traditionally try to prove that if C is correct, then WT(P) => OK(P'). This is hard to prove, because compilers are complex. But in a world of mobile applets, even if we trust the compiler, we don't trust that P' will reach the execution site without malicious interference. Also, we perhaps shouldn't trust the compiler anyhow! New idea: define a static semantics (type system) for P', and show that WT(P) => WT'(P') and WT'(P') => OK'(P'). Java Bytecode Verification (Leroy article) For example, this is how Java works, where P is a .java source program, P' is the byte code equivalent, WT() is source language Java typing, WT'() is byte code verification. Note that byte code instructions are typed, but locations are not, so must infer these. Safety conditions: Statically checked: - Registers and stack slots are properly typed for instructions - No stack underflow/overflow - All jumps are to valid destinations - Local vars are initialized - Objects are initialized before use - Access controls are respected. (Last two aren't safety properties.) Dynamically checked: - Array bounds - Null ptrs - Downcasts Unfortunately, Sun doesn't have formal spec of verification rules; academics have reverse engineered one (several, actually). Example of simplified VM dynamic and static rules (Stata & Abadi article) Use dataflow analysis on type abstract interp. Note problem with interfaces; Sun's verifier gives up because non-trivial to define l.u.b's on type structure when interfaces are allowed. Typed Assembly Language (TAL) -- Morrisett et al. What happens when we translate from an abstract machine (like JVM) or other IR to real machine code (optimizing as we go?) Want to guarantee safety (more specifically: that abstractions are respected) at machine code leval, again by static checks. Must add typing rules, operational semantics, and notion of well-typedness to machine code. (Not absolutely low-level, e.g., still have a "malloc" instruction.) Example of TAL/X86 code to show style, need for annotations, use of stack types. There are compilers generating TAL from C-like source languages.