chap 23: udp: user datagram protocol 23.1 introduction sock_dgram by default socket is unconnected 23.2 code intro See Figure 23.3 udp function relationship to rest of kernel Figure 23.3 global variables Figure 23.4 udp stats and Figure 23.5 sample udp stats note: 1. dropped due to no socket udps_noport 2. dropped due to full socket buffer udps_fullsock 23.3 UDP protosw structure Figure 23.8 udp_input - messages from ip layer handed off here udp_ctlinput - called by ICMP on errors ip_cltinput - admin requests from procs udp_usrreq - system calls call this (e.g., write) udp_init - boottime init ... udp_sysctl - sysctl(8o also have udp_output, not in proto sw ... to do writes Figure 23.4 udp header 23.9 udphdr structure 23.10 layout on wire Figure 23.11 udpiphdr ( ip plus udp hdr ) + macros Figure 23.12 ipovly structure not same as ip header 23.5 udp_init function Figure 23.13 init udb pointers 23.6 udp_output Figure 23.14, how write functions end up at udp_output udp_usrreq: PRU_SEND udp_output called args: inp - ptr to socket pcb m - mbuf chain addr - optional dst control - optional control Figure 23.15 udp_output, 1st part control info throw out if there is an addr if already have faddr already connected ... therefore error (if you connect already and then use sendto to specify an addr, that's an error) splnet blocks interrupt side call in_pcbconnect else no addr must have been connected prepend mbuf for udp/ip hdrs sosend will leave room if a cluster is allocated udp checksum and psuedo-hdr udp fills in some of fields in i/udp hdrs, calculates udp checksum which is over some IP hdr fields, and this is tricky 3 rules help: 1. 3rd 32-bit word in psuedo-hdr (Figure 23.17) looks similar to 3rd 32-bit word in ip hdr (Fig. 23.16). 2. order of 32-bit entities doesn't matter. 3. 0's don't matter Figure 23.16, shaded fields filled in by ip Figure 23.17, psuedo-hdr used for checksum computation Figure 23.18, udpiphdr used by udp_output Figure 23.19 ops to fill in hdr and csum Figure 23.20, shows the last half of the udp_output function fill in extended upd hdr fields ... note pcb use calculate checksum fill in len/ttl/tos not in csum call ip_output if address disconnect 23.7 udp_input function ip_input calls it thru pr_input function called at splnet udp_input goal: place mbuf chain on appropriate socket input queue and wakeup TH Figure 23.21, general validation strip options do pullup and make sure ip/udp in same buffer set uh to point to it make mbuf header match udp length save copy of ip header in case of ICMP problem if udp cksum on and there is a checksum (not 0) set ipovly fields to 0 set length calc/test checksum ip_input removes ip hdr length from ip_len before calling udp_input 1. ip_len == uh_ulen ... normal case 2. ip_len > uh_ulen, ip datagram too big if so truncate, not an error if udp length wrong, likely have csum error 3. ip_len < uh_ulen, discard it copy of ip hdr saved before csum as csum wipes fields. checksum only verified if udp checksums are turned on via udpcksum global variable (sysctl) net.inet.udp.checksum: 1 note this is a buggy test: if only tests if variable is on. it should test if udpcksum is enable on incoming packet. this bug has been widely copied. demultiplex unicast datagrams Figure 23.34: demux of unicast datagram try last cache of pcb ... lookup based on lport/fport/faddr/laddr note: highest probability here is that dest. port numbers are different, therefore test that 1st failed ... look it up via in_pcblookup if no pcb send icmp unreachable/port note: Partridge and Pink [1993] paper suggests that one-behind cache is practically useless. servers for the most part are like so: (local-port, wildcard, wildcard, wildcard) connection-less client only specifies dest. address when using sendto most of the time: laddr, faddr, fport are WILDCARDs. Figure 23.25: udp_input deliver pkt to socket udp source port, and ip src put in sockaddr if control ops IP_RECVDSTADDR returns dst ip address as control info udp_saveopt allocs mbuf and saves that info #ifdef notyet commented out take out ip/udp hdr from mbuf call sbappendaddr to append mbufs to socket recv buffer readside wakeup for procs reading on socket queue demux multicast and broadcast datagrams datagrams delivered to all sockets that match Figure 23.26 demux of bcast/mcast datagrams we use last to show that a match was found and to avoid making mbuf copies unless we need to (there is > 1 copy) if multicast/bcast global sockaddr variable gets source port, source ip addr decrement ip/udp header loop thru all pcbs if local port fails to match continue if local ip addr not wildcard if local ip addr not ip dst continue if foreign ip addr not wildcard if foreign ip not ip src OR foreign port not udp src port continue if last is there ;that is, this is not the 1st match copy of datagram placed on input queue do sbappendaddr to put on socket rcv q, if that works, call sorwakeup save match in last pointer give up if SO_REUSEPORT or SO_REUSEADDR are NOT set ... must be set to continue if no matching pcb set counter and bail put last on socket input queue and call wakeup consider connected udp/multi-homed server problem on Figure 23.27 23.8 udp_saveopt function if IP_RECVDSTADDR socket option, you want the IP for the incoming interface Figure 23.28 udp_saveopt - create mbuf and save address, note use of cmsghdr 23.9 ucp_cltinput icmp dest.unreachable, etc., will cause protocol control function. This function for udp. inputs: cmd - one of PRC_XXX as in Figure 11.19 sa - contains src ip (except for redirect in which case it contains dst that should be redirected) ip - for some ICMP errors, ptr to ip hdr for error packet. basically calls in_pcbnotify if it takes action at all passes udp_notify function as parameter udp_notify wakes up select callers ... 23.10 udp_usrreq function Figure 23.32 start/finish of udp_usrreq sotoinpcb converts socket to proto control block if req is PRU_CONTROL (ioctl call) pass to in_control error if no pcb, and request other than PRU_ATTACH (socket) switch on request ... release: free control free mbuf Figure 23.33 PRU_ATTACH (socket call) pcb must be null allocate pcb reserve socket buffer space set default ttl PRU_DETACH call udp_detach udp_detach, Figure 23.34 fixup one-behind cache if it points to this pcb call in_pcbdetach PRU_BIND, Figure 23.35 call in_pcbbind PRU_LISTEN error ... Figure 23.36 more PRU_CONNECT if faddr ip set, error call in_pcbconnect if ok, call soisconnected (state set to connected for socket) PRU_CONNECT2 not supported (unix socket uses this) PRU_ACCEPT not supported Figure 23.37, PRU_DISCONNECT if foreign ip addr NOT set error in_pcbdisconnect to do work set local ip addr to 0 disconnect socket state PRU_SHUTDOWN call socantsendmore PRU_SEND call udp_output PRU_ABORT (not likely) disconnect socket call udp_detach PRU_SOCKADDR setsockaddr PRU_PEERADDR setpeeraddr PRU_SENSE, generated by fstat system call could return size of send buffer but does nothing return 0 others are not supported in Figure 23.40 23.11 udp_sysctl function turn checksum on ... has more functionality now, including net.inet.udp.log_in_vain: 1 log scans on ports with no listener net.inet.udp.blackhole: 1 don't return icmp error if udp port has no listener. 23.12 implementation refinements cache considerations ... as linear search used for pcb lookup. udp checksum one suggestion: combined copy and checksum 23.13 summary simple, much of work done by pcb common code.