chap 10. IP fragmentation and reassembly All the King's men ... 10.1 Introduction pkt too big, IP cuts it up unless DF bit set UDP yes (e.g., NFS) plus tunnels and other things may cause fragmentation. ip_id flags field (high-order 3 bits of ip_off) offset field, low order 13 bits ... therefore every bit signifies 8 bytes. Every fragment except last must contain multiple of 8 bytes bit 1 is DF bit 2 MF Figure 10.1 Figure 10.2 fragment of a 65535 byte datagram. WORST POSSIBLE FRAGMENTATION fragment offset computed from start of datagram e.g., ip_off=2 means we are 16 bytes out max. value of ip_off is 8189 10.2 code intro netinet/ip_var.h reassembly data structures netinet/ip_output.c fragmentation code netinet/ip_intput.c reassembly code ipq - reassembly list, not q a *list* 10.3 fragmentation Figure 10.6 determine fragment size if DF error to process if UDP ... and end/end ICMP error with suggested MTU back to ip_src otherwise len = (mtu - hlen) & 7 rounded down to 8-byte chunk with header left out if mtu is too small ... can't do anything note: len is how much we send in this fragment Figure 10.7 construct fragment list turn packet into fragment "shrapnel" first mbuf chain becomes first fragment, start with 2nd fragment 10.4 ip_optcopy function - IGNORE! 10.5 reassembly ip_input must 1st reassemble packet before passing it up See Figure 10.11 if either fragment offset is nonzero, or MF set ... then we have fragment if both are zero, we have the whole thing See Figure 10.12 10.6 ip_reass function input: a fragment that we know is in the list and a ptr to the list action: reassembly and give us a complete DG or just glue it in and hope for more Figure 10.13 struct ipq next/prev pointers, list of datagrams ipq_ttl - time for reassembly to occur ipq_p protocol ipq_id sequence id struct ipasfrag *next/prev list of fragments struct in_addr ipq_src, ipq_dst 4 fields required to id fragment ip_id, ip_p, ip_src, ip_dst Figure 10.14 ipasfrag - note basically ipv4 header with some fields overlaid ipf_next and ipf_prev for gluing fragments in IP DG together Figure 10.15 ... ipq points to list of potential ip DGs doubly linked for fast insertion/deletion each ipq points to fragments ordered by fragment offset in DG note timeout in ip_q (ttl) homebrew# grep ipq_ttl *.c ip_input.c: fp->ipq_ttl = IPFRAGTTL; ip_input.c: --fp->ipq_ttl; ip_input.c: if (fp->prev->ipq_ttl == 0) { homebrew# grep IPFRAGTTL *.h ip.h:#define IPFRAGTTL 60 /* time to live for frags, slowhz */ homebrew# netstat -s | grep frag ... 0 fragments dropped after timeout ******************************************** ... ICMP message sent of fragment times out at ip_dst to ip_src Figure 10.16 ip hdr in fragment is overlaid with ipasfrag ... Figure 10.17 fragment list ordered by offset Figure 10.18 ip_reass function goal: take fragment and put it into the list, possibly returning a complete IP datagram inputs: ip: fragment fp: possible ipq or NULL if none was found leave IP header out of the calculations ... if fragment dropped ... release memory why called? no memory is 1 reason trivia: all IP host implementations must be at least able to put a DG of 576 bytes back together. This is one reason why UDP/RIP/TFTP/DNS/SNMP uses that size as a maximum. and why OSPF is on top of IP. Figure 10.19 if 1st fragment to arrive, create new ipq note: insque: on VAX this was CISC instruction. assumed to be part of hw set. insque/remque primitives this is why next/prev ptrs are at base offset ... note prev/next fragment pt. to self Figure 10.20 insque remque ip_eng - insert fragment p just after fragment prev ip_dst - remove fragment p reassembly timeout 60 ttl ... ipq_ttl decremented everytime the kernel calls ip_slowtimo 2 a second, therefore fragment has 30 seconds to live. note: kernel must have got fragment #1, as first 64 bits must be sent in ICMP error message but then net/3 doesn't send the ICMP error either (which is probably a bad thing) acc to RFC, timeout should be 60-120 secs. Figure 10.21 find correct offset position in fragment list IMPORTANT: byte ranges within a fragment MAY overlap ... Several usoft security problems have been due to DOS attacks based on this notion. Happened to linux too. Figure 10.22 logically: a fragment could overlap at the front at the back completely be contained within completey be contained at an edge Briefly ... Figure 10.23 new fragment overlaps with preceding, trim it to preceding note m_adj used here Figure 10.24 trim/discard existing fragments. current overlaps front of earlier fragment, trim earlier fragment or discard it if overlapped Figure 10.25 insert new fragment in rightful place return 0 here means not done yet note possible case of last fragment has MF set ... therefore still not done even though we appear to not be missing anything Figure 10.26 it is all there... glue it together with m_cat Figure 10.27 the new "packet" is actually a collection of mbufs glued together (not very efficient ...) Figure 10.28 we have to make sure the ip hdr is in the 1st mbuf, and skipped in the subsequent ones. and the ip_hdr has to be put back together again too. Figure 10.29 datagram reassembly whew! 10.7 ip_slowtimo called every 500ms 2 hz. traverse ipq list decrement ttl if == 0 burn it ... ip_freef walk ipq list, dequeue and free remque in ipq free that too ip_drain may be called when more memory needed during mbuf allocation burn ipq list ... Datagram identifiers: