chap 8: ip BIG CHAPTER! basic processing: input/forwarding/output/checksum/setsockopt&getsockopt syscalls RFC 791 + RFC 1122 Chapter 9 - we SKIP IT! Chapter 10 - unfortunately we do not skip it ... fragmentation/reassembly reminder ipintrq ... bottom-half but splimp for device driver to put pkt on it splnet for ip to process do reassembly at final end system ip_dst may send icmp_error as appropriate See Figure 8.1 8.2 code intro net/route.h - route structure (rtentry) netinet/ip.h - ip header netinet/ip_input.c - input netinet/ip_output.c - duh ... netinet/in_cksum.c - duh again globals of interest See Figure 8.3 ip_id last one used ipstat - ip stats dumped with netstat -s See Figure 8.4 now # netstat -s ... leaving out the other protos ip: 33279 total packets received 0 bad header checksums 0 with size smaller than minimum 0 with data size < data length 0 with ip length > max ip packet size 0 with header length < data size 0 with data length < header length 0 with bad options 0 with incorrect version number 0 fragments received 0 fragments dropped (dup or out of space) 0 fragments dropped after timeout 0 packets reassembled ok 31969 packets for this host 7 packets for unknown/unsupported protocol 0 packets forwarded (0 packets fast forwarded) 1303 packets not forwardable 0 packets received for unknown multicast group 0 redirects sent 34154 packets sent from this host 0 packets sent with fabricated ip header 0 output packets dropped due to no bufs, etc. 0 output packets discarded due to no route 0 output datagrams fragmented 0 fragments created 0 datagrams that can't be fragmented 0 tunneling packets that can't find gif 0 datagrams with bad address in header icmp: 21 calls to icmp_error 0 errors not generated 'cuz old message was icmp Output histogram: destination unreachable: 21 0 messages with bad code fields 0 messages < minimum length 0 bad checksums 0 messages with bad length 0 multicast echo requests ignored 0 multicast timestamp requests ignored Input histogram: destination unreachable: 7 0 message responses generated 0 invalid return addresses 0 no return routes ICMP address mask responses are disabled Figure 8.5 netstat -s (linux has netstat -p proto by the way, which is handier) 8.3 IP Packet figure 8.7 review ... figure 8.8 had better be review figure 8.9, what you find in netinet/ip.h btw: all fields that are short in the book are u_short now ip_hl is usually 5 ... can at max by 60 bytes. meaning 40 bytes of options ... which is one of the two reasons options are not very good/useful. 8.4 ipintr interface queues packet on ipintrq, schednetisr ... now ipintr runs, priority now splnet ipintr in 4 parts: 1. verification of incoming packets (sane?) 2. option processing (ignore!) forwarding (not ignore) 3. packet reassembly, chap 10 4. demultiplexing (kick it upstairs) figure 8.11 is the goto good or bad? actually more important: why does it exist? (at least 2 reasons) Verification Figure 8.12 note checksum check as a most curious item value should be 0 if undamaged ip_len converted to host byte order ... otherwise we can't use it same for ip_id, ip_off ... forwarding has to put these things back to net byte order byte check ... ip len is > bytes ok, the bytes didn't make it, toss it bytes > ip len ip packet can easily be smaller than min ethernet pkt ... so deal with that now in theory we have a sane packet To forward or not to forward? figure 8.13 if we have option data then do it ... do_options decides if we are NOT done with it (returns 0) or we are (1, therefore go for next packet) figure 8.14 ... unicast addr or 4 kinds of broadcast, including directed and limited. note: it is legal for a host to reject pkts that arrive on the wrong interface, but BSD does not do that. It is not a good idea to do so ... (reduces routing redundancy). Reassembly and demultiplexing Figure 8.15 Reassembly in Chap 10. not ip_p used as table jump to transport note: ip/tcp input is a function call ..., not something more complicated 8.5 forwarding ip_forward function ip_forwarding must be set to 1: how? at boot: /etc/rc.conf gateway_enable="YES" leads to case ${gateway_enable} in [Yy][Ee][Ss]) echo -n ' IP gateway=YES' sysctl net.inet.ip.forwarding=1 >/dev/null ;; esac i.e., sysctl -w net.inet.ip.forwarding=1 --------------------------------------------------------- Basic route structure: struct route{ struct rtentry *ro_rt; /* real brains here */ struct sockaddr ro_dst; /* the gateway in the route */ }; dst is *key* for lookup ... Figure 8.17 ip_forward ip_forward we will ignore source routing ... ip ptr set to data if link broadcast or in_canforward fails on dst free m and bail in_canforward makes sanity checks loopback ... nope net 0 or class E ... nope class D, should be processed by ip_mforward, not here put ip_id back to net order decrement the ttl (by 1 ...) call ICMP_ERROR if ttl time exceeded ip_forwarding caches the most recent route ... if no cache or dst doesn't match (cache failed) free by reference count rtalloc called to look up the route via a socket ... note socket setup if fails send ICMP unreachable host reset cache ptr save at most *8 bytes*, not 64 ... don't actually use m_copy either at this point note: we are not done yet! Figure 8.19 note redirect rules: 1. came in and goes out on same if 2. don't do it if done before 3. not default route 4. enabled to send redirects (on by default) 5. not src rt if pkt indeed ok so far, AND from local i/f get possible dest if gateway ... use that else get out of packet itself send only host route redirect, not net note that in a subnetted environment ... subnet mask is not included ... therefore subnet redirects are ambiguous Figure 8.20 call ip_output to forward the packet IP_ALLOWBROADCAST would allow directed broadcast if not an error and no icmp needed free the pkt switch on icmp error type note type was set in Figure 8.19 ... so we have remaining two kinds of 'unreachable' host fragmentation and source quench note: switch translates system errors into ICMPs See Figure 8.21 8.6 Output Processing: ip_output function called from at least 2 places: 1. ip_forward 2. transports transports just call it directly not protocol independent on the other hand, routing sockets are protocol independent and do use pr_output header initialization Figure 8.22 options are merged in form ip pointer fill in IP header as long as not forwarding and not raw note: IP_FORWARDING set by ip_forward/ip_mforward prevent this code from munging already set header values See Figure 8.23 note: MSG_DONTROUTE used in flags in send/* sets IP_ROUTETOIF per write OR SO_DONTROUTE sets IP_ROUTETOIF for all writes transports pass it down. route selection/Figure 8.24 if no route ... udp/tcp have per socket route cache if none, set ro to point to ip local route cache set dst pointer if there is one make sure if is up and that dst matches if invalid ... free reference and start again init socketaddr if routing to interface only must match ip_dst to interface somehow ifa_ifwithdstaddr searches pt/pt interfaces in_ifwithnet searches others may be of use to routing protocols for ignoring routing tables locate route call rtalloc to look it up note we extract interface address, ifp, and dst if route is gateway ... why the dst? source address selectin/fragmentation Figure 8.25 make sure we have ip_src fragment if necessary so ip_src is 0 at this point (not if tcp in use ...) note that forwarded pkts have it, tcp pkts have it ... pkt originating here may not we set it to interface address ... make broadcast validity checks if mtu fits we can send it NOW restore various headers to net order call if_output ... a fragmenting we go ... 8.7 in_cksum in theory two operations dominate the time: 1. data copy 2. checksum computation ++++ 3. route lookup ... when we have 1 zillion routes and we are forwarding everything. not true of host. Figure 8.26 optimized versions of csum algorithm 1. adjacent bytes to be checksummed are paired to form 16-bit ints and one's complement sum of these ints is formed. 2. checksum field itself is set to 0, 16-bit one's complement computed over bytes, and result placed in checksum field. It is NOT byte-swapped. 3. to verify: compute over same set of bytes, including checksum field. If the result is all 1's ... -0 in one's complement arithmetic, the check was ok. 0 is both all 0's and all 1's. If no bits are altered, the computed result the computed checksum is the compliment of a + -a, i.e., all the ip header fields PLUS the computed checksum. The result should be a+-a which is 0. figure 8.27 - naive implementation figure 8.28 - ouchy version, in_cksum deals with mbuf chains too note p. 239 ... incremental checksum at forwarding time due only to ttl field change possible 8.8 setsockopt/getsockopt system calls setsockopt getsockopt allow many "whacky" features from the socket programming view, different layers, e.g., tcp/udp/ip/icmp/multicast have different options available man tcp man udp man ip (multicast here too) etc ... e.g., set the ttl on a socket int ttl = 60; /* max = 255 */ setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl)); so as in Figure 8.31, calls to the IP layer, for set/get pass thru here note: parameters passed in mbuf socket must lead to pcb ... which is where info will be stored Figure 8.32 PRCO_SETOPT processing result is either int set or flag bit set ... getopt is just the reverse read to mbuf and write it back out in socket land 8.9 ip_sysctl function forwarding/redirects/defttl all settable ... actually ... net.inet.ip.portrange.lowfirst: 1023 net.inet.ip.portrange.lowlast: 600 net.inet.ip.portrange.first: 1024 net.inet.ip.portrange.last: 5000 net.inet.ip.portrange.hifirst: 49152 net.inet.ip.portrange.hilast: 65535 net.inet.ip.forwarding: 1 net.inet.ip.redirect: 1 net.inet.ip.ttl: 64 net.inet.ip.rtexpire: 3600 net.inet.ip.rtminexpire: 10 net.inet.ip.rtmaxcache: 128 net.inet.ip.sourceroute: 0 net.inet.ip.intr_queue_maxlen: 50 net.inet.ip.intr_queue_drops: 0 net.inet.ip.accept_sourceroute: 0 net.inet.ip.fastforwarding: 0 net.inet.ip.keepfaith: 0 net.inet.ip.gifttl: 30 net.inet.ip.mvifinput: 1 net.inet.ip.subnets_are_local: 0 net.inet.ip.fw.enable: 1 net.inet.ip.fw.one_pass: 1 net.inet.ip.fw.debug: 1 net.inet.ip.fw.verbose: 1 net.inet.ip.fw.verbose_limit: 100 net.inet.ip.fw.dyn_buckets: 256 net.inet.ip.fw.curr_dyn_buckets: 256 net.inet.ip.fw.dyn_count: 0 net.inet.ip.fw.dyn_max: 1000 net.inet.ip.fw.static_count: 14 net.inet.ip.fw.dyn_ack_lifetime: 300 net.inet.ip.fw.dyn_syn_lifetime: 20 net.inet.ip.fw.dyn_fin_lifetime: 1 net.inet.ip.fw.dyn_rst_lifetime: 1 net.inet.ip.fw.dyn_udp_lifetime: 10 net.inet.ip.fw.dyn_short_lifetime: 5 net.inet.ip.fw.dyn_grace_time: 10 net.inet.ip.maxfragpackets: 436 net.inet.ip.check_interface: 0 net.inet.icmp.maskrepl: 0 net.inet.icmp.icmplim: 200 net.inet.icmp.drop_redirect: 0 net.inet.icmp.log_redirect: 0 net.inet.icmp.icmplim_output: 1 net.inet.icmp.bmcastecho: 0 whew ...