Last week we talked about syntax, the "shape" of a program.
We zoomed in on the syntax analysis phase, where text becomes an AST.
Now, before we run our program, we can check our program.
This is the goal of the static analysis phase.
Today, we'll introduce the basics of typechecking.
In next lecture, we'll discuss scoping and how it affects typechecking.
The adjectives static and dynamic are super important to PL theory.
They have many different meanings in English, but these are the relevant ones:
The static properties of a program are observed by reading the program.
The dynamic properties of a program are observed by running the program.
We don't use this phrase in practice, but we might call our execution phase "dynamic analysis".
A program is correct if it always behaves exactly as intended.
What exactly is "intended" depends on context.
The specification (or "spec") of a program defines its intended behavior.
Without a specification, a program cannot be correct or incorrect.
Specifications are usually informal: written using natural language.
There are many different aspects of program behavior we may want to specify.
Many fields within CS study some kind of "correctness" criteria.
The field of computational complexity studies efficiency.
The field of human-computer interaction studies usability.
The field of computer security studies, well, security.
These are all forms of correctness, among many others.
Each of these fields provides tools to specify and analyze relevant forms of correctness.
In PL, we're mostly concerned with safety: the absence of errors.
(This is much different than security, as a security class can explain.)
The study of PL provides tools to specify and analyze the safety of a program.
We include:
We consider a program correct if it always executes without errors.
How can we know whether a program is correct?
One way of checking a program's correctness is by testing it.
Courses on software engineering cover testing in depth.
By definition, software testing involves running the program.
This means that we have to come up with inputs to test with.
Testing is great, but this is a serious limitation.
As a human, you can reason about an action without doing it, right?
You don't even need all the details of the situation.
For example, if I played a 1-on-1 match with any professional basketball player, I would certainly lose.
I know that without even knowing which professional player I'd be facing.
All I need to know is that they're in the set of professional players.
The purpose of static analysis is to predict results without running the program.
This means we don't have to come up with test inputs for static analysis.
Static analysis is limited too, although more powerful than you might expect.
Static analysis doesn't replace testing; they complement each other nicely.
The concept of static types is foundational in static analysis.
Knowing an expression's type helps us predict things about it.
A type describes a set of expressions with some commonality.
Familiar types include integers, booleans, integer arrays, boolean arrays, ...
Nearly all programming languages have a concept of a "type", but they vary widely.
Our interpreter for lab 4 threw a runtime error on 1 + "a".
This is a dynamic type error, because it happens at runtime.
Some languages are dynamically-typed: their types only exist at runtime.
The main modern dynamically-typed languages are Python and JavaScript.
We can predict that 1 + "a" will fail, right?
Our language forbids adding any number to any string.
Consider the Python expression (9999999999 * "x") + (1 + "a").
At runtime, it slowly constructs a huge string and then throws an error.
A statically-typed language predicts these errors by typechecking.
A statically-typed language is type-safe if it never throws a type-related error at runtime.
An ideal typechecker guarantees type safety by catching errors before runtime.
We have a special way to write the rules that define a typechecker.
These typing rules are a specification for our typechecker code.
Typing rules look like this:
In each rule, if all premises are true, then the conclusion is true.
If the top is empty, then the conclusion is just true.
The syntax e : t means "the expression e has type t".
We call this a typing judgement.
(This is a different meaning of : than in our ?: operator.)
We have three types: number, boolean, and string.
We'll use this naming convention in our rules:
These aren't all of the typing rules for our language, just a sample.
The typing rules tell us how to both write and check our typechecking code.
We'll see how to write typechecking code from specification in lab soon.
To check a type on paper, we play a little game of typing derivations.
We start with the typing judgement we want to check under a line, for example:
Next we apply a rule whose bottom judgement looks like ours.
None of the other rules apply in this case, but sometimes we may have multiple rules to choose between.
Now we have to repeat the process for both premises.
The derivation is complete when all premises are under a line. This shows our original judgement to be true.
The t1 = t2 premise in the Conditional rule is special: we just put a line over it if it's obviously true, like string = string or number = number .
If we find that it is impossible to produce a complete derivation, we have shown our original judgement to be false.
That's it for this week!
Next week, we'll talk all about variables and scope.
We'll finally have a real programming language to play with!