Chapter 4 - Debugging Techniques kernel code debugging can be tricky consider: how do we debug top-half code from a user process? mode switch ... gets in the way. interrupt-side is an even worse question: plus it's async. top-half vs bottom-half interactions events/races races from interrupts/SMP (> 1 cpu) how exactly do you think one user process can debug another user process anyway? (man ptrace). approaches then are: 1. debugging by printing 2. debugging by querying (ask the kernel for info) 3. debugging by watching (e.g., watch behavior of user process 4. debugging by READING THE ERROR MESSAGES (by paying attention) oops - null pointer reference 5. kernel debuggers and other tools Various kernel config options exist that may be of help. See pp 73-75. note CONFIG_DEBUG_INFO for example. -------------------------------------- debugging by printing printk loglevels/priorities included with messages macros: from highest/worst to lowest/info KERN_EMERG high priority low number KERN_ALERT KERN_CRIT KERN_ERR KERN_WARNING KERN_NOTICE KERN_INFO KERN_DEBUG lo priority hi number #define KERN_EMERG "<0>" /* system is unusable */ #define KERN_ALERT "<1>" /* action must be taken immediately */ #define KERN_CRIT "<2>" /* critical conditions */ #define KERN_ERR "<3>" /* error conditions */ #define KERN_WARNING "<4>" /* warning conditions */ #define KERN_NOTICE "<5>" /* normal but significant condition */ #define KERN_INFO "<6>" /* informational */ #define KERN_DEBUG "<7>" /* debug-level messages */ printk(KERN_EMERG "System halted.\n"); no priority means a default is used. This is not a good idea. default is: DEFAULT_MESSAGE_LOGLEVEL, which is probably KERN_WARNING. int in kernel console_loglevel value determines whether messages go to the virtual console or not. note logging system consists of both klogd/syslogd on linux ... syslogd config may mean messages go to another computer entirely. messages are probably stored in /var/log/messages this is unrelated to console_loglevel console_loglevel may be adjusted by using klogd -n or by setting kernel.printk value (4 values) or [kernel]# sysctl -a | grep print kernel.printk = 1 4 1 7 setting /proc value [kernel]# cat /proc/sys/kernel/printk 1 4 1 7 all kernel messages will go to console echo 8 > /proc/sys/kernel/printk [root@zymurgy kernel]# sysctl -a |grep printk kernel.printk = 8 4 1 7 first is console loglevel. 2nd is default level for messages. ------------------------------------------------- How messages get logged printk writes messages into circular queue in kernel. at some point they will wrap. therefore this means what about bottom-half printk writes? wakes up any processes blocked on reading that ... sleeping on syslog(2) or reading /proc/kmsg klogd likely reads /proc/kmsg dmesg is use app that simply dumps the queue. # dmesg # less /var/log/messages to see what klogd did with the messages read from the kernel (or written to the console or the network) btw: printk can be used in a interrupt handler ---------------------------------------------- klogd reads (and consumes) kernel messages from the buffer hands them off to syslogd which checks /etc/syslog.conf to decide what to do about the situation ... may write locally to syslog may use UDP syslog port/protocol to ship messages off system. UDP port 514 (TCP port 514 is old shell) it may be worthwhile to make the buffer bigger for debug. turning the messages on/off 1. it is a good idea to take a moment and think about the design of your printk/debug architecture. 2. it is a good idea to NOT REMOVE IT when "done". as there may be bugs later. two approaches, macro or C code branch test macro: see p. 80 #undef PDEBUG #ifdef SCULL_DEBUG #ifdef __KERNEL__ # in case we are in the kernel #define PDEBUG(fmt, args...) printk(KERN_DEBUG "scull: " fmt, ## args) #else # user space #define PDEBUG(fmt, args...) fprintf(stderr, fmt, ## args) #endif #else #define PDEBUG(fmt, args...) /* it's off jim */ #endif so if we compile with -DSCULL_DEBUG we turn PDEBUG on in our code See Makefile code, p. 102 note: we need a gcc compiler with extensions to support macros with variable arguments. GNU make is also needed. code/branch int debug = 0; /* or 1 */ could be input parameter #ifdef DEBUG if (debug) { print ... } #endif what are the tradeoffs between these two approaches? Rate Limiting: too many debug messages may overflow console/buffer, etc. possible problem scenario: you have a fatal error message: my network card is KERFLUUEY! you receive 100000 interrupts because you are being attacked ... (formally the problem of livelock is being exacerbated because every interrupt causes a print message and klogd/syslogd to try to log a message to a file). 1. you can set a flag in the driver to say: I have complained about this, and I don't need to do it twice (or 5 times). 2. kernel has printk_ratelimit(void); if (printk_ratelimit()) { printk(KERN_NOTICE "The printer is still on fire\n"); } if a threshold (settable with sysctl) is exceeded, messages are dropped. or /proc/sys/kernel/printk_ratelimit - # of secs to wait before reenabling msgs /proc/sys/kernel/printk_ratelimit_burst - # of msgs accepted before rate limiting. Printing device numbers has utility macros for major/minor #. see p. 82 int print_dev_t takes dev 2-tuple and encodes in string buffer. returns count of chars in buffer. char *format_dev_t takes dev 2-tuple and encodes in string buffer. returns modified buffer buffers should be provided, and should be at least 20 bytes long Debugging by querying EVERY LINE PRINTED CAUSES FILE I/O (or network i/o ...) printk can slow down the system so much that race bugs are masked. device driver writer knows this rule: it ain't debugged 'til the printfs are *off* ... so one might query the system (pull rather than push) as opposed to having the system blab on every event "just the facts, ma'am". three ideas here: 1. put a "dynamic" file in /proc somewhere; that is, you write out the values of variables. e.g., cd /proc/net/arp ps/top/uptime and many other utilities read /proc for kernel state cat arp ... you get the arp table ... (arp(8) uses this) driver can create read-only proc file ... function is called in driver when /proc/mycode is read. see create_proc_read_entry, p. 105 for registration of a function to be called when file is read see read_proc entry point p. 104. for the function itself 2. use ioctls (to do about anything ... this is a device driver entry point, and you can put any kind of state machine you want in with ioctls for moving data to/from the kernel. 3. use sysfs, see chap. 14. using the /proc filesystem /proc exists for kernel to export info to world. may import in some cases (read/write, but assume readonly) entries may be dynamically inserted by modules, removed at rmmod time new recommended way is believed to be sysfs, as /proc is thought to be a mess. too late ... overview: 1. you must register a proc file 2. you must delete your proc files when done 3. you must pass a callback function in at registration that will process i/o when a read occurs from proc device, kernel allocates a PAGE of memory, and calls your registered: read_proc function, see p. 84 function returns directly: # of bytes output. *start is NULL to start with, you advance it if and only if you have a logical offset. see code on p. 85 creating your own /proc file see code p. 86, create_proc_read_entry registration function note deregistration routine as well. proc problems include: . no reference count, therefore /proc file can be removed when in use. . no attempt to distinquish two instances ... seq_file interface implementation of large file dumps is harder. seq_file assumes you are stepping thru a large file by a sequence of items you must create "iterator" object, see p. 87 for details. need 4 iterator methods: 1. start 2. next 3. stop 4. show how a sequence is implemented is up to the module. bytes/structures whatever. book sez: may be viewed as a "cursor" on a byte array. use this if you have more complex data structures or more output the ioctl method can easily read/write int variables or structures too and from the kernel. app program must be written and kept in sync with the kernel trade-off: ioctl is less obvious than /proc debugging by watching. p. 108 basic idea: track behavior of user-level apps; i.e., watch them. app-level printfs/debug/syscall-call tracing all possible methodologies strace command. prints out info about system calls. Note you may need to install it. you may see errors on calling or on return values. try # strace ls find the open where "." is opened (note you may have any amount of strange load module init syscalls) # strace cat strace gets info from the kernel and therefore a prog. need not be compiled for debug. bottom line: strace useful for runtime syscall errors. perror(3) may not be used or not verbose enough. debugging system faults driver errors can result in system faults not likely to be a panic (fatal error) oops message possible. e.g., null pointer dereference will cause an oops message. or "stackicide" char *f; f = 0; *f = 1; page fault generated and oops dump occurs eip is likely most useful info. and note: info in /var/log/messages see code p. 94 for examples of faults run the faulty module problem: if we are given a hex address, how to map hex address to C-level? 1. note: gdb can disassemble/dump instructions e.g., gdb can do (gdb) x/20i (print as instructions ...) OR (gdb) x/20i fault_write This GDB was configured as "i386-redhat-linux"...(no debugging symbols found)... (gdb) x/20i 0x0 : push %ebp 0x1 : mov %esp,%ebp 0x3 : mov $0x0,%eax *********************************************** previous was the instruction 0x8 : pop %ebp 0x9 : ret 0xa : mov %esi,%esi or use the compiler for that matter cc -S foo.c gives foo.s note: use the same make incantation else you don't get the same code of course. next: stackicide example on p. 95 note how oops message is different. note stack dump: EIP have ffffffff in them. how did that get there? kernel must be built with CONFIG_KALLSYMS option to get function backtrace system hangs: the system may experience a complete hang; aka "silent death". silent death is no fun at all. what happens if your module does this: while(1) ; adding printk is perhaps the best option but WRITE TO THE CONSOLE (you don't want buffering here) change the console_loglevel value though if in industry you might have to use a logic analyzer you can use schedule call or if lucky enough to be on SMP system, not totally toast hint: system may hang keyboard or somesuch, but you can usually tell if the kernel is still scheduling or still working at the interrupt level. 1. put up an xclock for the former 2. ping it for the latter there can be a magic sysrequest (Alt-somekey) capability compiled into the kernel. You must do this yourself (make it be in the kernel). see p. 97 for more info. sometimes a good idea to NOT have the X server enabled. file systems: journaling file systems really a good idea ... ext3/reiserfs > ext2 NFS remote mount also good because your file system syncronization is in the remote NFS server, not your kernel Debuggers/etc: possible flavors:; 1. use gdb to look at kernel. can't set breakpoints, etc. can read variables (need to know some tricks). You use the core dump feature, but it isn't a core dump. It's a magic kernel /proc file that simulates a core dump. 2. kdb kernel debugger. low-level assembly monitor. 3. "kgdb" full kernel debug with remote system. must use serial connection or TCP connection. gdb "attach" function is used. (remote kernel is similar to attaching to a process that was not spawned by the debugger). 4. odds/ends gdb "core" debug option compile the kernel for debug. CONFIG_DEBUG_INFO # gdb /usr/src/linux/vmlinux /proc/kcore vmlinux is not the compressed boot version. It is needed for symbol table info. gdb symbol_table magic_memory_file magic_memory_file may be live kernel or dead core in the real world. /usr/src/linux/vmlinux is uncompressed kernel /proc/kcore - "core file", but in this case the kernel is live. kernel should be compiled with gdb and not stripped of symbols gdb does not have the normal abilities if a single computer is being used. cannot set breakpoints can disassemble you can use objdump foo.o to do this as well (gdb) p jiffies etc. should work BUT. can print structures, follow pointers. due to the fact this is for core debug, gdb caches data, therefore live kernel variables may be cached (and wrong). (gdb) core-file /proc/kcore to flush gdb cache to load a loadable module into the debugger, see pp. 100-101 1. debugger must be taught where in kernel address space module's text/data have been loaded. need base addresses. /sys/module under /sysfs load scull -> /sys/module/scull/sections contains this info 2. load gdb incantation as shown on p. 101 kdb kernel debugger on BSD you can use this, but on linux may not be available. machine-level debugger can get as patch from oss.sgi.com magic key (pause/break) gets you into the debugger do not leave this debugger/key in for "customer" systems kgdb patch debug via 2 computers over serial line (this is the full meal deal) or over TCP for delux version full gdb debug breakpoints/single-step etc. crash dump analyzers traditional unix kernel tool. kernel patches required. idea: kernel crash(oops) occurs, write memory/registers info to disk space (swap ...) you specify in advance at boot, you recover info and examine it. user-mode linux runs on virtual linux machine process on top of that easy to use gdb has limited access to hardware (and how could you debug race conditions?) dynamic probes debug tool allows "probe" (code in a specialized language) to be put anywhere in kernel. arbitrary code executed when you hit a certain execution point http://oss.software.ibm.com