CS 320, Fall 2021

Lecture 6: static analysis, part 1

Introduction

Last week we talked about syntax, the "shape" of a program.

We zoomed in on the syntax analysis phase, where text becomes an AST.

Now, before we run our program, we can check our program.

This is the goal of the static analysis phase.

Today, we'll introduce the basics of typechecking.

In next lecture, we'll discuss scoping and how it affects typechecking.

Static and dynamic

The adjectives static and dynamic are super important to PL theory.

They have many different meanings in English, but these are the relevant ones:

The static properties of a program are observed by reading the program.

The dynamic properties of a program are observed by running the program.

We don't use this phrase in practice, but we might call our execution phase "dynamic analysis".

Correctness

A program is correct if it always behaves exactly as intended.

What exactly is "intended" depends on context.

Specification

The specification (or "spec") of a program defines its intended behavior.

Without a specification, a program cannot be correct or incorrect.

Specifications are usually informal: written using natural language.

There are many different aspects of program behavior we may want to specify.

Different kinds of correctness

Many fields within CS study some kind of "correctness" criteria.

The field of computational complexity studies efficiency.

The field of human-computer interaction studies usability.

The field of computer security studies, well, security.

These are all forms of correctness, among many others.

Each of these fields provides tools to specify and analyze relevant forms of correctness.

Safety

In PL, we're mostly concerned with safety: the absence of errors.

(This is much different than security, as a security class can explain.)

The study of PL provides tools to specify and analyze the safety of a program.

We include:

We consider a program correct if it always executes without errors.

Checking

How can we know whether a program is correct?

Testing

One way of checking a program's correctness is by testing it.

Courses on software engineering cover testing in depth.

By definition, software testing involves running the program.

This means that we have to come up with inputs to test with.

Testing is great, but this is a serious limitation.

Predictive reasoning

As a human, you can reason about an action without doing it, right?

You don't even need all the details of the situation.

For example, if I played a 1-on-1 match with any professional basketball player, I would certainly lose.

I know that without even knowing which professional player I'd be facing.

All I need to know is that they're in the set of professional players.

Static analysis

The purpose of static analysis is to predict results without running the program.

This means we don't have to come up with test inputs for static analysis.

Static analysis is limited too, although more powerful than you might expect.

Static analysis doesn't replace testing; they complement each other nicely.

Typechecking

The concept of static types is foundational in static analysis.

Knowing an expression's type helps us predict things about it.

Types

A type describes a set of expressions with some commonality.

Familiar types include integers, booleans, integer arrays, boolean arrays, ...

Nearly all programming languages have a concept of a "type", but they vary widely.

Dynamic types

Our interpreter for lab 4 threw a runtime error on 1 + "a".

This is a dynamic type error, because it happens at runtime.

Some languages are dynamically-typed: their types only exist at runtime.

The main modern dynamically-typed languages are Python and JavaScript.

Static types

We can predict that 1 + "a" will fail, right?

Our language forbids adding any number to any string.

Consider the Python expression (9999999999 * "x") + (1 + "a").

At runtime, it slowly constructs a huge string and then throws an error.

A statically-typed language predicts these errors by typechecking.

Type safety

A statically-typed language is type-safe if it never throws a type-related error at runtime.

An ideal typechecker guarantees type safety by catching errors before runtime.

Typing rules

We have a special way to write the rules that define a typechecker.

These typing rules are a specification for our typechecker code.

Reading typing rules

Typing rules look like this: RuleName premise1 premise2 premise3 ... conclusion

In each rule, if all premises are true, then the conclusion is true.

If the top is empty, then the conclusion is just true. RuleName conclusion

The syntax e : t means "the expression e has type t".

We call this a typing judgement.

(This is a different meaning of : than in our ?: operator.)

Calculator language typing rules

We have three types: number, boolean, and string.

We'll use this naming convention in our rules:

These aren't all of the typing rules for our language, just a sample.

Valuen n : number Valueb b : boolean Values s : string Plusn e1 : number e2 : number (e1 + e2) : number Pluss e1 : string e2 : string (e1 + e2) : string LessThans e1 : string e2 : string (e1 < e2) : boolean Timesns e1 : number e2 : string (e1 * e2) : string Conditional e1 : boolean e2 : t1 e3 : t2 t1 = t2 (e1 ? e2 : e3) : t1

Typing derivations

The typing rules tell us how to both write and check our typechecking code.

We'll see how to write typechecking code from specification in lab soon.

To check a type on paper, we play a little game of typing derivations.

We start with the typing judgement we want to check under a line, for example:

((2 * "a") + "b") : string

Next we apply a rule whose bottom judgement looks like ours.

Pluss (2 * "a") : string "b" : string ((2 * "a") + "b") : string

None of the other rules apply in this case, but sometimes we may have multiple rules to choose between.

Now we have to repeat the process for both premises.

Pluss Timesns Valuen 2 : number Values "a" : string (2 * "a") : string Values "b" : string ((2 * "a") + "b") : string

The derivation is complete when all premises are under a line. This shows our original judgement to be true.

The t1 = t2 premise in the Conditional rule is special: we just put a line over it if it's obviously true, like string = string or number = number .

If we find that it is impossible to produce a complete derivation, we have shown our original judgement to be false.

Pluss Timesns Valuen 2 : number Values "a" : string (2 * "a") : string 3 : string ((2 * "a") + 3) : string

Looking forward

That's it for this week!

Next week, we'll talk all about variables and scope.

We'll finally have a real programming language to play with!