From ad-hoc to generic

- So far, we’ve been building bare-metal applications in an ad-hoc manner
- … which would be reasonable in a custom embedded system
- … but what if we want a more generic, reusable foundation for building and deploying computer systems?
- (also known as an “operating system” 😊)
- Let’s take a look at L4 as an initial case study …

Why L4?

Context …

In the “Programatica” project, we were looking to build an OS kernel with very high assurance of separation between domains

Approaches to Kernel Design

- In a **monolithic kernel**, all OS code runs in kernel mode
  - improves performance; reduces reliability
- A **microkernel** design aims to minimize the amount of code that runs in kernel mode (the “trusted computing base” or TCB) and implement as much functionality as it can in “user level servers”
  - A microkernel must abstract physical memory, CPU (threads), and interrupts/exceptions
  - A microkernel must also provide (efficient) mechanisms for communication and synchronization
  - A microkernel should be “policy free”
Microkernel design: L4

- L4 is a “second generation” µ-kernel design, originally designed by Jochen Liedtke.
- Designed to show that µ-kernel based systems are usable in practice with good performance.
- Minimalist philosophy: If it can be implemented outside the kernel, it doesn’t belong inside.

Why pick L4?

- L4 is industrially and technically relevant.
  - Multiple working implementations (Pistachio, Fiasco, OKL4, etc…).
  - Multiple supported architectures (ia32, arm, powerpc, mips, sparc, …).
  - Already used in a variety of domains, including real-time, security, virtual machines & monitors, etc…
  - Open Kernel Labs spin-off from NICTA & UNSW.
  - Commercial use by Qualcomm and others …

Why pick L4?

- L4 is industrially and technically relevant.
- L4 is small enough to be tractable.
  - Original implementation ~ 12K executable.
  - Recent/portable/flexible implementations ~ 10-20 KLOC C++.
  - Much easier to implement than a full POSIX OS, for example!

Why pick L4?

- L4 is industrially and technically relevant.
- L4 is small enough to be tractable.
- L4 is real enough to be interesting.
  - For example, we can run multiple, separated instances of Linux (specifically: L4Linux, Wombat) on top of an L4 µ-kernel.
  - Use somebody else’s POSIX layer rather than build our own!
  - Detailed specification documents are available.

Why pick L4?

- L4 is industrially and technically relevant.
- L4 is small enough to be tractable.
- L4 is real enough to be interesting.
- L4 is a good representative of the target domain and a good tool for exposing core research challenges.
  - Threads, address spaces, IPC, preemption, interrupts, etc… are core µ-kernel concepts, regardless of API details.
  - It should be possible to retarget to a different API or µ-kernel design.

Why pick L4?

- L4 is industrially and technically relevant.
- L4 is small enough to be tractable.
- L4 is real enough to be interesting.
- L4 is a good representative of the target domain and a good tool for exposing core research challenges.
- L4 is “not invented here”.
  - We’re not in the business of OS design and implementation.
  - Leverage the insights and expertise of the OS community so that we can focus on our own research goals.
  - A credibility boost, showing that our methods apply to other people’s problems (we can’t change the OS design to make our lives easier …).
Evolution of L4

- IA32
- clans & chiefs
- portable
- privileged spaces
- global thread ids
- redirection
- multiprocessor
- IA32 & ARM
- capability-based

NICTA N1

- For concreteness, this presentation will be based (mostly) on the NICTA N1 version of the L4 spec
- Available in reference section of D2L course content
- (primary reference for pork)
- Lots of diagrams of bitdata and memory area structures
- ... implications for language design?

Address Space Layout

Userspace perspective

Kernel Information Page
(maped in to every address space)

User Thread Control Block

One UTCB for each (possible) thread in the address space
How to find the KIP

- Option 1: Design protocol
  - User code assumes a predetermined KIP address

- Option 2: “Slow system call” … a “virtual” instruction
  - User code executes the illegal instruction LOCK NOP
  - This triggers an illegal opcode exception, which enters the kernel
  - The kernel checks for this exception, loads the kip address into the context registers, and returns to user mode

What are the gaps for?

- Schedule
  - Kernel
  - User

- Systemcall
  - Kernel
  - User

- Memory
  - Kernel
  - User

- Executable
  - Kernel
  - User

- Kernel
  - Kernel
  - User

- User
  - Kernel
  - User
What’s in the UTCB area?

- Every user thread has a User Thread Control Block (UTCB), which is a block of memory that the thread uses for communication with the kernel.
- The UTCB contains:
  - Message registers (MRs)
  - Thread control registers (TCRs)
- All UTCBs for a given address space are grouped in a single block called the UTCB area
- Example: If UTCBs are 512 bytes long, then an address space with a 4KB UTCB area can support at most 8 threads

Trust, and UTCBs

- User processes can read and write whatever values they like in the UTCB (and in the UTCBs of other threads in the same address space)
- Protected thread parameters (e.g., priority) must be stored in a separate TCB data structure that is only accessible to the kernel
- Any data that is read from the UTCB cannot be trusted and must be validated by the kernel, as necessary, before use
- Mappings for the UTCB area must be created by the kernel (otherwise user space code could cause the kernel to page fault by reading from an unmapped UTCB)

UTCB addresses and local thread ids

- Every UTCB must be 64-byte aligned, so the lower 6 bits in any UTCB address will be zero
- Within a given address space, UTCB addresses are used as local thread ids:

<table>
<thead>
<tr>
<th>Local thread ID</th>
<th>Local id 64 (24/16)</th>
<th>000000</th>
</tr>
</thead>
<tbody>
<tr>
<td>Other thread ids must have a nonzero value in their least significant 6 bits</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

How to find the UTCB

- Option 1: Design Protocol
  - User code assumes a predetermined UTCB address
- Option 2: The UTCB pointer
  - At boot time, the kernel creates a 4 byte, read only segment in the GDT for a specific kernelspace address and loads a corresponding segment selector in %gs
  - The kernel stores the UTCB address of the current thread in that location
  - User code can read the UTCB address from %gs:0

Configuring an address space

- The addresses of the KIP and the UTCB can be set when a new address space is created:
- First, create a new thread in a new address space (we’ll see how this is done soon)
- Now use the (privileged) SpaceControl system call:

<table>
<thead>
<tr>
<th>SpaceSpecifier control</th>
<th>EAX</th>
<th>ECX</th>
<th>EDX</th>
<th>EDX</th>
<th>EBX</th>
</tr>
</thead>
<tbody>
<tr>
<td>KernelInterfacePageArea</td>
<td>UdcArea</td>
<td>UdcArea</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>UdcArea</td>
<td>UdcArea</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>UdcArea</td>
<td>UdcArea</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SpaceControl</td>
<td>callSpaceControl</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Threads cannot be activated (made runnable) until the associated address space has been configured in this way
Thread numbers

- Every thread number falls in to one of three ranges:

<table>
<thead>
<tr>
<th>0</th>
<th>SystemBase</th>
<th>UserBase</th>
</tr>
</thead>
<tbody>
<tr>
<td>hardware interrupts</td>
<td>kernel reserved</td>
<td>user thread numbers</td>
</tr>
</tbody>
</table>

- The SystemBase and UserBase values are defined in the KIP
- Key insight: L4 translates hardware interrupts in to messages from (special) threads

Global ids bad ...

- The reliance on global ids is one of the weaknesses of the original L4 design
  - Any thread can reference any other thread by using its global id
  - Any thread can interfere with another thread (e.g., a denial of service attack) by using its global id
  - Even if thread ids are not officially published, they can still be guessed or faked
- We could avoid these problems if there were a way to ensure that any thread only had the capability to access a specific set of authorized threads ...

ThreadControl

- New threads are created using the (privileged) ThreadControl system call:

<table>
<thead>
<tr>
<th>dest</th>
<th>EAX</th>
<th>Pager</th>
<th>EDX</th>
<th>Scheduler</th>
<th>ESI</th>
<th>SpaceSpecifier</th>
<th>EDI</th>
<th>UidLocation</th>
<th>EBI</th>
<th>ESP</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

  - Thread Control → EAX result
    - EDX ~
    - EDX ~
    - EDI ~
    - EBI ~
    - ESP ~

  - If dest does not exist then the new thread is created in the same address space as SpaceSpecifier
  - If SpaceSpecifier=dest, then a new address space is created
  - The UtcLocation must be within the UTCB area
  - If dest exists and SpaceSpecifier is nilthread, then the thread is deleted

Exception handlers, pagers, and schedulers

- Every thread has three associated threads

  - The exception handler is responsible for dealing with any exceptions that t generates (specified in UTCB)
  - The pager is responsible for dealing with any page faults that t generates (specified in UTCB)
  - The scheduler is responsible for setting the priority and timeslice for t (hidden inside kernel TCB)
A typical use of IPC proceeds as follows:

1. If the thread must be in the same address space as the caller.
2. Exchange Registers is not “privileged” … but the destination thread must be an address space of the caller.
3. A thread can read or write parameters of another thread.
4. The timeslice: how long does the thread run before the kernel will switch to another thread.
5. Quantum specifies a limit on the total time that a thread can run before it is suspended.

A thread can give up any remaining part of its timeslice to another thread using the ThreadSwitch system call:

- If dest is nilthread, then the caller still yields the CPU and the kernel determines which thread will run next …
Synchronization and blocking

- Communication between threads requires a sender and a receiver
- If either party is not ready, then the communication blocks
- Some versions of L4 allow an IPC call to specify timeout periods, after which a blocked IPC call will be aborted.
- In practice, it is hard to come up with a good methodology for picking sensible timeout values
- Other versions of L4 support only two possible timeout options: 0 (non-blocking) and \( \infty \) (blocking)

Message tags

- The value in MR\(_0\) provides a message tag that describes the structure of the message in the remaining message registers:

  \[
  \begin{array}{|c|c|c|c|}
  \hline
  \text{label} & \text{Ipc} & \text{u} & \text{t} \\
  \hline
  \text{MR} & 0 & 1 & 2 \\
  \hline
  \end{array}
  \]

  - label can be used to send/receive a 16 bit data value
  - \( u \) is the number of untyped words (uninterpreted 32 bit word values) sent in message registers
  - \( t \) is the number of typed-item words (Mapitem, Grantitem; we'll talk about these soon …)

Example: Thread start

- When a new thread is constructed, it waits for a message from its pager before starting:

  \[
  \begin{array}{|c|c|c|c|}
  \hline
  \text{Initial SP} & \text{Initial SP} & \text{MR} & 0 \\
  \hline
  \text{flags} & \text{flags} & \hline
  \hline
  \text{EBX} & \text{ESI} & \text{EDX} & \text{ESP} \\
  \text{ESP} & \text{EBP} & \text{EBX} & \text{EBX} \\
  \end{array}
  \]

  - When a newly created thread receives a message of this form, the kernel loads the specified esp and ebp values from the message into the thread's context and marks the thread as being runnable …

Example: Interrupt handlers

- When a hardware interrupt occurs, the kernel sends an IPC message from the interrupt thread to its pager with the tag:

  \[
  \begin{array}{|c|c|c|c|}
  \hline
  \text{From Interrupt Thread} & \text{From Interrupt Thread} & \text{From Interrupt Thread} & \text{From Interrupt Thread} \\
  \hline
  -1 & 0 & 0 & 0 \\
  \hline
  \text{u} & \text{t} & \text{Ipc} & \text{MR} \\
  \hline
  \hline
  \end{array}
  \]

  - When the pager has finished handling the error, it sends an IPC message back to the interrupt thread to reenable the corresponding interrupt

  \[
  \begin{array}{|c|c|c|c|}
  \hline
  \text{To Interrupt Thread} & \text{To Interrupt Thread} & \text{To Interrupt Thread} & \text{To Interrupt Thread} \\
  \hline
  0 & 0 & 0 & 0 \\
  \hline
  \text{u} & \text{t} & \text{Ipc} & \text{MR} \\
  \hline
  \hline
  \end{array}
  \]

Example: Exception handling

- When a thread generates an exception, the kernel sends a message to the associated exception handler

  \[
  \begin{array}{|c|c|c|c|}
  \hline
  \text{Message Format (possibly having updated registers in the process)} & \text{Message Format (possibly having updated registers in the process)} & \text{Message Format (possibly having updated registers in the process)} & \text{Message Format (possibly having updated registers in the process)} \\
  \hline
  \text{ExceptionNo} & \text{SpaceSpecifier} & \text{FromSpecifier} & \text{Ipc} \\
  \hline
  \text{EBX} & \text{ESI} & \text{ECX} & \text{ESP} \\
  \text{ESP} & \text{EBP} & \text{EBX} & \text{EBX} \\
  \end{array}
  \]

  - If it chooses to resume the thread that generated the exception, it responds with a message of essentially the same format (possibly having updated registers in the process)
Address Space Management

Mapping and granting

- Address spaces in L4 are constructed by mapping or granting regions of memory between address spaces

Flexpages (fpages)

- A generalized form of “page” that can vary in size:
- Includes both 4KB pages and 4MB superpages as special cases
- Also includes special cases to represent the full address space (complete) and the empty address space (nilpage):
- Can be represented, in practice, using collections of 4KB and 4MB pages

MapItems and GrantItems

- A MapItem specifies a region of memory in the sender’s address space that will be mapped into the receiver’s address space
- A GrantItem specifies a region of memory that will be removed from the sender’s address space and added to the receiver’s address space
- Base values are used for mapping between fpages of different sizes; we will mostly ignore them for now

Typed items in IPC messages

- An IPC message can contain multiple “typed items” (either MapItem or GrantItem values), that will create mappings in the receiver based on mappings in the sender
- The receiver sets an “acceptor” fpage in its UTCB to specify where newly received mappings should be received
- To receive anywhere, set the acceptor to “complete”
- To receive nowhere, set the acceptor to “nilpage”

Page faults

- When a thread triggers a page fault, the kernel translates that event into an IPC to the thread’s pager:
- The pager can respond by sending back a reply with a new mapping … that also restarts the faulting thread:
The “recursive address space model”

- created by the kernel at boot time
- threads in these address spaces are “privileged”
- all physical memory is mapped in $\sigma_0$
- In a dynamic system, we need the ability to revoke previous mappings … this will get interesting …

Let’s look at an example …

A demo using “pork”

 Initialization code in root.c:

```c
printf("This is a root server!\n");
showKIP();
ping = L4_GlobalId(100,);
pong = L4_GlobalId(200,);
//startPing();
spawn("ping", ping, // Name & thread id
      0, ping, // struct & space spec
      L4_Myself(), // Scheduler
      L4_Pager(), // Pager
      ((L4_Word_t)ping_thread, // eip
       ((L4_Word_t)pingstack) + PINGSTACKSIZE)); // esp

//startPong();
spawn("pong", pong, // Name & thread id
      0, pong, // struct & space spec
      L4_Myself(), // Scheduler
      L4_Pager(), // Pager
      ((L4_Word_t)pong_thread, // eip
       ((L4_Word_t)pongstack) + PONGSTACKSIZE)); // esp

//keyboard listener
L4_ThreadId_t keyId  = L4_GlobalId(1,1);
L4_ThreadId_t rootId = L4_MyGlobalId();
printf("keyboard id = \%x, my id = \%x\n", keyId, rootId);
printf("associate produces \%x\n", L4_AssociateInterrupt(keyId, rootId));
```

Event loop code in root.c:

```c
L4_MsgTag_t tag = L4_Receive(keyId);
for (;;) {
    printf("received msg (tag=\%x) from \%x", tag, keyId);
    if (L4_IpcSucceeded(tag) &&
        L4_UntypedWords(tag) == 0 &&
        L4_TypedWords(tag) == 0) {
        printf("Scancode = 0x\%x\n", inb(0x60));
        L4_LoadMR(0, 0); // tag: Empty message, ping back to interrupt thread
        tag = L4_Call(keyId);
        printf("Root's Call completed ...\n");
    } else {
        printf("Ignoring message/failure, trying again ...\n");
        tag = L4_Receive(keyId);
    }
}
printf("This message won't appear!\n");
```