Assignment 2 - add a system call

For this assignment, you must add a new system call to Linux that informs the calling process how many times it has been context switched since the last time it checked. You will need to:

Note: If you have trouble compiling your user space program, try setting /usr/include/asm and /usr/include/linux to be symbolic links to their respective directories in the source tree.

I recommend protecting your code by ifdefs and adding a configuration option to the "make xconfig" menu. You can do this by adding code similar to the following into arch/i386/config.in:

mainmenu_option next_comment
comment 'cse513 options'
bool 'Enable context switch counter syscall' CONFIG_CTX_SYSCALL
endmenu
 
The next time you run "make xconfig", you should see a new top level menu with an option to enable or disable the context switch counter (assuming all of your new code is inside #ifdef CONFIG_CTX_SYSCALL / #endif pairs). Make sure any new source files include linux/config.h. You can add documentation for new options by editing Documentation/Configure.help.

About system calls

So, you might be wondering what a system call does and how it works. System calls are the primary means of communication between user space and the kernel. Essentially, the user space program loads the system call number into the eax register and causes an interrupt (typically int 0x80), then the kernel executes some task on behalf of the user.

The kernel maintains an array called idt_table (defined in arch/i386/kernel/trap.c), that contains pointers to all the interrupt handlers. The processor has a register, set at boot time, that contains the address of the beginning of idt_table, so it always knows where to find the handler for a given interrupt.

The 0x80th entry is system_call (defined in arch/i386/kernel/entry.S). This assembly saves the register state, grabs the system call number from eax, and (if the number is within the range of valid system calls) calls the appropriate system call (looking it up in an array of function pointers called sys_call_table (also defined in arch/i386/kernel/entry.S)). When the system call finishes, system_call runs the scheduler if necsessary and handles pending signals if any, then returns to user space.

About the scheduler

The scheduler resides in kernel/sched.c. It is invoked from a number of places, including the timer interrupt (if the current task uses up its allocation of processor time) and any time the system is about to return to user space (if the need_resched flag is set).

Usually, in user space, we talk about processes being uniquely identified by their Process ID (or PID). In the kernel, however, it is more convenient to refer to tasks with pointers to stuctures that include their state. Each "task_struct" (defined in include/linux/sched.h) contains a large amount of information about each process, including pointers to the memory map, open files, the PID, pointers to the parent, sibling, and youngest child process task_structs, the counter (decremented by the timer interrupt until zero, at which point the scheduler is called), and a "state" variable that can be either "TASK_RUNNING", "TASK_INTERRUPTIBLE", "TASK_UNINTERRUPTIBLE", "TASK_ZOMBIE", or "TASK_STOPPED".

Tasks enter the TASK_RUNNING state whenever they have work to do, and enter the TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE state whenever they are sleeping or waiting for I/O (the main user-visible difference between INTERRUPTIBLE and UNINTERRUPTIBLE is that INTERRUPTIBLE tasks can be killed with CTRL-C or "kill").

The scheduler's job is to pick the highest priority process with a counter greater than zero from the set of all processes in the TASK_RUNNING state, and make that the currently running process. If there are no TASK_RUNNING processes, it runs the idle task. If there are TASK_RUNNING processes, but all of them have counters of zero, the scheduler resets the counters of all tasks to some small positive number (usually around 20), then picks the one with the highest priority.

Within the scheduler, it refers to the previously running task as prev, and the task it is about to context switch in as next. When it is done, it sets current equal to next.

Elsewhere in the kernel (most frequently in system calls), it is often convenient to use current to view or modify the task structure of the currently running task. For instance, if we have a system call that puts its caller to sleep, we might include the following code:

current->state=TASK_INTERRUPTIBLE;
schedule();
printk(KERN_DEBUG "waking up\n");
This will halt execution of this task within the kernel. The scheduler will notice that current->state is not TASK_RUNNING and remove current from the run queue. The third line will not be executed unless some other part of the kernel puts the task back in the TASK_RUNNING state and enters it back into the runqueue. Suppose we add a timer to do just that:
init_timer(&timer);
timer.expires = jiffies + 10; /* 10 expire in 10 jiffies */
timer.data = (unsigned long) current; /* set callback argument */
timer.function = wake_up_process /* set callback function */
add_timer(&timer);
current->state=TASK_INTERRUPTIBLE;
schedule();
printk(KERN_DEBUG "waking up\n");
This code has a subtle race condition. What if the timer expires before current->state is set to TASK_INTERRUPTIBLE? The task will go to sleep waiting forever for a timer that has already gone off.

The solution is to set the state before setting the timer.

init_timer(&timer);
timer.expires = jiffies + 10; /* expire in 10 jiffies (100 milliseconds)*/
timer.data = (unsigned long) current; /* set callback argument */
timer.function = wake_up_process /* set callback function */
current->state=TASK_INTERRUPTIBLE;
add_timer(&timer);
schedule();
printk(KERN_DEBUG "waking up\n");
This way, if the timer goes off after add_timer and before schedule, the scheduler will keep the current task on the runqueue, because wake_up_process reset its state to TASK_RUNNING. The code will continue past schedule as if nothing happened (except a 10 jiffy delay).

You may be wondering how all the other kernel code that may have ran in that 100 milliseconds (a jiffy is usually 10 milliseconds) didn't mess up the stack. It turns out that Linux always allocates 8 kilobytes (2 pages) for each task_struct. Only a small part is used for process state, the rest is used for the stack. When the scheduler switches processes, it also switches to a different stack. The current task only has to worry about sharing its stack with interrupt handlers, but not with other tasks executing in kernel space.

For your assignment, you don't need to put the current process to sleep, but its good to be aware of the technique, since many system calls do put their callers to sleep.

Deliverables

Turn in a printout of your system call code, your user space program, a sample run of your program, and a printout of your modification to sched.c and fork.c (with about 10 lines of leading and trailing context, not the whole file). You can work in groups of 2 or three if you'd like (you only need to turn in one copy per group).

Additional Documentation

A good tutorial for adding a new system call to Linux can be found here.