TCP/IP Illustrated V.2/Preface, and Chap. 1 These are instructor notes. We will look at the book in class. As there will be no attempt to reproduce everything. -------------------------------------------------------------- Wright/Stevens preface: Book organization: mbufs, 2 domains 7 socket layer 15-17 pcbs 22 data link early (3,4,5) and late, bpf 31 ip/icmp/igmp, 6, 8, 9 multicast 14 ip routing, 18-20 arp 21 transports, udp 23, tcp 24-30 raw ip/bfp 32/31 more or less, bottom-up note BSD copyright: open src, and "it's not my fault" ... but not the GNU copyleft --------------------------------------------------------- 1. note BSD coding style if (tp) { /* 2 tab indent */ } Vi [[ and ]] plus tab indents. Note they are showing 4 space indents, which is NOT normal. 2. typo conventions structure: mbuf{} variable name #define 3. man pages section 1 - commands (ls/vi/cc, netstat) section 2 - system calls (getpid, read, socket, bind, listen, ...) section 3 - library calls (fopen, fread, ntoa) section 4 - devices, and other curious system-related interface entities (ip, lo, ppp, icmp, tcp, wi) section 5 - files, ftpd.conf, resolv.conf, passwd ... section 8 - admin commands, halt, inetd, ifconfig, route, routed, mrouted 4. history UCB/CSRG (Computer Science Research Group), 4.4 BSD. Massive impact on TCP/IP Internet ... The reference implementation. 4.4BSD-lite (the end). code ran under BVSD./386 V1.1 ... transition to the following: Now evolved more in FreeBSD/OpenBSD/NetBSD E.g., www.freebsd.org 4.2BSD 1983 ... VAX (as had virtual memory, ethernet, IMPs) first BSD release. Bill Joy and others. 4.3BSD 1986 - tcp performance 4.3BSD Tahoe - 1988, slow start, congestion avoidance, fast retransmit 4.3BSD Reno - 1990, fast recovery, tcp header prediction, slip header compression, routing table changes 4.4BSD - 1993 - multicasting, long fat pipes Net/3 synonym FreeBSD (x86 ...) /OpenBSD (security)/NetBSD (arch. portability) used by SunOS, pre-solaris. Some use in SystemV R4, but streams/ISO had impact there too. AIX (IBM), and probably most important. Wind River VxWorks APIs: TLI - transport layer interface, from Sys V XTI is superset. sockets - BSD sockets, wintel mods for WinSock Tons of app code written for BSD sockets. This is why you find such things as ftp/arp/netstat/nslookup under a windows dos bos. 1.5 example program note: that socket means > 3 things ... (6 things ... don't forget the TCP transport "socket", which is a 4-tuple) note that errno could have been done more explicity 1.6 system calls and library functions Unix V7 (say about 1979), 50 system calls 4.4BSD about 135 Section 2 Currently in FreeBSD /usr/lib/libc.a ... (kitchen sink) system call stubs and stdio library APP calls function push args onto stack or into registers make software interrupt call (e.g., Intel Arch. "int") old "trap" instruction kernel processing return value comes back in register or some other way (0 | errno ...) with errno modified in user space irony: BSD send/sendto/sendmsg are examples of fuzziness. Note always clear to user programmer where boundary of user function system call kernel internal processing draws line. As there are really Three apis (user man page, library call into kernel (undocumented), kernel work split-up at top-half) socket may end up calling a kernel function with the same name, but sendto is another matter entirely read ... is a "little" complex 1.7 Network Implementation Overview 1. tcp/ip the stack 2. XNS - Xerox Stack. 3. OSI - 4. UNIX domain ... unix sockets, same machine IPC, pipes now implemented internally here note that system V message Q's are something else entirely note that stack really has 4 layers process system calls socket/bind/read/sendto/recvfrom/write/close ioctls socket layer protocol layer (TCP/IP) interface layer ethernet sub-layer, ppp, slip loopback ethernet device driver serial driver BPF could be said to have a VERY SKINNY protocol layer (queueing and process select/event management...) Every device driver has a BPF "tap" 1.8 Descriptors socket returns descriptor descriptor is handle process[descriptor table] -> open file entry -> vnode entry Figure 1.5 generic file handling code TO socket code TO pcbs ... (addressing) note: top-down and bottom-up mapping to write/read ... note: select as a basic attribute 1.9 mbufs internal kernel structure used for storage of data, network headers, and some misc. functions. note: write --> ethernet read <--- ethernet output: copy from user to kernel mbuf (socket code) copy from kernel mbuf to interface memory (driver) at some point: Jim must go to board and rant/wave hands about top-half/bottom-half model ... as it applies here with one important difference (soft priority levels on input) 1.10 input processing read/input initiated by interrupt, therefore asyncronous there must however be a process that has done a read apriori and is blocked. "they meet in the middle" ethernet: allocate mbuf and read i/o into it 2k cluster normal ethernet header is passed "on the side" note: sw interrupt scheduled to cause IP input to run later E.g., prioritization: user - lowest stack input processing (ip/tcp) middle hw i/o - highest ip input loops on a queue of input mbufs returns when done with mbufs note mbuf packet chain pointer used here does ip stuff: fragment reassembled ip hdr checksum processing possible routing new functions: firewall processing IPDIVERT passes to udp big thing here is pcb match you have ip src, ip dst, udp src port, udp dst port, which *socket* might actually want this packet mbuf processing ... when done with udp part, remove udp header by simply moving mbuf ptr socket layer mbuf chain with socket peer address first data 2nd i/o copy kernel to user wakeup of sleeping process note: mbufs on input allocated by driver, freed by socket layer on output: allocated by socket layer, freed by driver (or by tcp ... you have to pay attention) 1.11 network implementation overview revisited picture ... note: SPLIMP - means hw interrupt processing level why IMP? SPLNET - means priority-level for sw intr input processing 1.12 priorities acc. to BSD kernel/stack this is a logical view and must somehow be matched up to what cpu architecture provides macros ... danger: shared data structures (mbuf chains ...) between different kernel layers socket layer wants to allocate mbufs, and chain them up for processing going down, or release them for up interfaces wants to do "opposite" things ip wants to "munge them" and change the pointers ip input therefore s = splimp(); /* block drivers */ IF_DEQUEUE ... /* get a pointer to an mbuf chain */ splx(s); /* unblock the drivers */ 1.13 kernel code organization look at kernel ...