Chap 7: domains and protocols 7.1 intro High-level garp: problem statement: we have different address families at the socket layer. How do we conceptualize that and yet keep one set of socket code in common amongst AF? domain: - group of related protocols. Each has protocol family constant PF_INET ... tcp/udp, etc. man 4 inet (section 4 is devices and device drivers and well ...) PF_UNIX - unix socket, local IPC, same as PF_LOCAL (POSIX name) man 4 unix PF_ROUTE - routing socket (talk to routing table, get events from routing table) man 4 route Protocol family and Address family is 1-1, so for all practical purposes PF_NET == AF_NET 7.2 code intro netinet/domain.h - domain structure definition netinet/protosw.h - protocol switch structure definition netinet/in_proto.c - IP domain and protosw structures kern/uipc_domain.c - initialization and search functions globval variables a domain has protocols var datatype description domains struct domain * linked list of domains inetdomain struct domain guess ... inetsw struct protosw array of protosw for Inet max_linkhdr, etc See Figure 7.17 at this point 7.3 domain structure Figure 7.5 struct domain dom_family AF_INET dom_name internet dom_init ... dom_externalise/dom_dispose UNIX socket can pass fds from one proc to another protosw pointer dom_next ptr to next domain chapter 18 for these: dom_rtattach - init routing table dom_rtoffset - arg to previous in bits dom_maxrtkey - for routing layer 7.4 protosw structure domain has list of protocols pr_type SOCK_STREAM, SOCK_DGRAM, SOCK_RAW ... others not real pr_domain back to domain pr_protocol - proto number, protocol numbered within domain pr_flags, see figure 7.9, PR_ATOMIC - each request maps to one single protocol request PR_ADDR - passes addresses with each datagram PR_CONNREQUIRED - connection oriented previous 2 are mutually exclusive PR_WANTRCVD - notify when receives data PR_RIGHTS - supports access rights pr_input - input from below pr_output - output from above ptr_ctlinput - control input from below ptr_ctloutput - control from above pr_usrreq - user request from process pr_init - initialization hook at boot for protocol pr_fasttimo - 200ms timer, e.g., tcp timers pr_slowtimo - slow timeout 500ms, some timers may use this too pr_drain - give me some space please ... pr_sysctl - sysctl for protocol Figure 7.10 relationship between pr_type and flags Figure 7.11 protocol entry points pr_usrreq read/write process requests pr_output: data output going down a layer pr_input - protocol input from layer below to layer above e.g., tcp packets coming in it's not data: pr_ctloutput top down, socket options pr_ctlinput - ICMP errors 7.5 IP domain and protosw structures domain/protosw structures are all static/compile-time declared and initialized. For IP, we have inetsw array with protosw structures See Figure 7.12 and Figure 7.13 Why raw? 1. to implement a new new transport, maybe RUP 2. traceroute Figure 7.13 IP version of inetsw reality check: ip_input (*inetsw[ip_protox[ip->ip_p]].pr_input)(m, hlen, ip->ip_p); udp_output: routine error = ip_output(m, inp->inp_options, &inp->inp_route, (inp->inp_socket->so_options & (SO_DONTROUTE | SO_BROADCAST)), inp->inp_moptions); who calls udp_output ... pru_send, udp_send stub, (pr_usrreq from socket land) Figure 7.14. inetsw ... ip entry itself. note pr_slowtimo is used and pr_fasttimo is NOT used. slow timeout used by reassembly. figure 7.15 domaininit called at boot. add domains loop thru domains call domain init function loop thru protosw call protosw function fireup timeouts for fasttimo and slowtimo in 1 clock tick (1 hz) Figure 7.16 ... review ... Figure 7.19 pfslowtimo routine do protocol timeouts call timeout again to put back in timeout queue 100hz / 2 means 2 per 500ms pffasttimo routine 7.6 pffindproto pffindtype pffindproto(family=PF_INET, protocol=IPPROTO_TCP, type=SOCK_STREAM) find protosw ptr given above note: some address families might have > 1 SOCK_STREAM transport pffindtype(family, type) - find by type Figure 7.20 pffindtype - find *first* of a type find domain ptr ... loop thru domains search in that domain for type, return pointer to it pffindproto - more general, find specific known instance similar except find domain now find specific protocol and type match note: default RAW protocol value of 0 (default to do it yourself) if we are not finding it but search type is SOCK_RAW and we are pointing at protocol SOCK_RAW and any protocol was ok (0) and maybe isn't set yet maybe = ptr to protocol slot therefore: pffindproto(PF_INET, 27, SOCK_RAW) returns raw proto therefore process can do "protocol 27" on its own. in terms of socket calls, where used? 7.7 pfctlinput function used when event occurs that can affect EVERY (domain, protocol). 1. interface shutdown 2. routing change 3. icmp redirect since this might affect TCP (should start over on path mtu) UDP has udp_ctlinput TCP has tcp_ctlinput 7.8 IP Init ip_init does exist ip_protox array datagram has ip_p -> must map to TCP/UDP/ICMP/IGMP, etc. index into above array is exactly that value ... ip_protox points to inetsw[] E.g., udp proto number is 17, points via ip_protox to 1 in inetsw ip_init called by domaininit Figure 7.23 ... must find raw protocol init ip_protox array pr-initsw is ptr arithmetic that results in init to RAW protocol, therefore everything pts to something walk thru compile-time domain structure, take pointer to that structure, and set protocol index to point to it ipq ... ip reassembly queue init ip_id set from the system clock so is sorta random ip_ifmatrix ... for couting packets routed thru various interfaces (gone now) Note bzero/bcopy use ... at times ... why? why not memcpy, etc. 7.9 sysctl system call See sysctl(8) # sysctl -a (dump everything) # sysctl -w variable=1 // set a value system wide variables (boolean, or integers) for either info, or for setting various system variables possibly at boot. Tree structure to find variable: E.g., forwarding/routing ON as follows: net.inet.ip.forwarding: 1 Internally name is 4 ints: (array indices) net, inet, ip, forwarding sysctl calls: net_sysctl dispatches to protocol pr_sysctl ip_sysctl | udp_sysctl | tcp_sysctl E.g., current ip set in FBSD could be: net.inet.ip.portrange.lowfirst: 1023 net.inet.ip.portrange.lowlast: 600 net.inet.ip.portrange.first: 1024 net.inet.ip.portrange.last: 5000 net.inet.ip.portrange.hifirst: 49152 net.inet.ip.portrange.hilast: 65535 net.inet.ip.forwarding: 1 net.inet.ip.redirect: 1 net.inet.ip.ttl: 64 net.inet.ip.rtexpire: 3600 net.inet.ip.rtminexpire: 10 net.inet.ip.rtmaxcache: 128 net.inet.ip.sourceroute: 0 net.inet.ip.intr_queue_maxlen: 50 net.inet.ip.intr_queue_drops: 0 net.inet.ip.accept_sourceroute: 0 net.inet.ip.fastforwarding: 0 net.inet.ip.keepfaith: 0 net.inet.ip.gifttl: 30 net.inet.ip.mvifinput: 1 net.inet.ip.subnets_are_local: 0 net.inet.ip.fw.enable: 1 net.inet.ip.fw.one_pass: 1 net.inet.ip.fw.debug: 1 net.inet.ip.fw.verbose: 1 net.inet.ip.fw.verbose_limit: 100 net.inet.ip.fw.dyn_buckets: 256 net.inet.ip.fw.curr_dyn_buckets: 256 net.inet.ip.fw.dyn_count: 0 net.inet.ip.fw.dyn_max: 1000 net.inet.ip.fw.static_count: 14 net.inet.ip.fw.dyn_ack_lifetime: 300 net.inet.ip.fw.dyn_syn_lifetime: 20 net.inet.ip.fw.dyn_fin_lifetime: 1 net.inet.ip.fw.dyn_rst_lifetime: 1 net.inet.ip.fw.dyn_udp_lifetime: 10 net.inet.ip.fw.dyn_short_lifetime: 5 net.inet.ip.fw.dyn_grace_time: 10 net.inet.ip.maxfragpackets: 436 net.inet.ip.check_interface: 0 net.inet.ipsec.def_policy: 1 net.inet.ipsec.esp_trans_deflev: 1 net.inet.ipsec.esp_net_deflev: 1 net.inet.ipsec.ah_trans_deflev: 1 net.inet.ipsec.ah_net_deflev: 1 net.inet.ipsec.ah_cleartos: 1 net.inet.ipsec.ah_offsetmask: 0 net.inet.ipsec.dfbit: 0 net.inet.ipsec.ecn: 0 net.inet.ipsec.debug: 1 net.inet.ipsec.esp_randpad: -1 ------------------------------------------------------------ or for udp: net.inet.udp.checksum: 1 net.inet.udp.maxdgram: 9216 net.inet.udp.recvspace: 42080 net.inet.udp.log_in_vain: 1 net.inet.udp.blackhole: 0 ------------------------------------------------------------ Figure 7.26 net_sysctl extract family/protocol find domain find protocol ... if it has pr_sysctl call it, with remainder of name array