Linux kernel internals book: chapter 2: memory addressing outline: IA segmentation overview how linux uses it paging and other hw how linux uses paging 8086 (Intel Architecture) oriented. memory mgmt: chapter 2: memory addressing chapter 7: how kernel allocates memory to itself chapter 8: how "linear addresses" are assigned to processes note: linear address is Intel-speak. 3 kinds of addresses: 1. logical address (segmented addressing) segmented address: address is sum of which segment + which offset we load segmentation register, and instructions are relative to that segment (e.g., for code or data) this allows code relocation in memory. 2. linear address (virtual address) 32 bit address, can address 4G. 2**32. In hex: 0x00000000 .. 0xffffffff virtual addresses can be viewed as being VERY big segments. 3. physical address 32 bit integers. why would it be bad to encode physical addresses in a program? traditionally we think of 2 POVs logical program address fed to segmentation registers which produce virtual address fed to paging hardware produces physical address multiprocessor system > 1 cpu accesses all ram the same way need memory arbiter between cpu/ram as ram must be accessed in a serial fashion ... grants access dma is a form of memory arbiter, even in single-cpu system also moves data in parallel to cpu segmentation 386 did 1. real mode, used pre VM systems, still used at boot (init to that state, then setup mmu for protected mode) 2. protected mode logical address = segment part + offset part segment selector: 16 bit offset: 32 bit 6 seg. registers: cs, ss, ds, es, fs, gs only hold segment selector: cs - code segment, points to code also contqains privilege level: 0 - highest privilege (kernel mode) 3 - lower privilege (user mode) ss - stack segment, points to stack ds - data points to data other 3 are general purpose segment descriptor: stored in either GDT - global descriptor table gdtr register points to it LDT - local descriptor table ldtr points to it one global GDT, one LDT per process segment descriptor: 32 bits of base: linear address of 1st byte of segment 20 bits of limit: page pointer, 4k pages 20 + 12 == 32 therefore segment size between 4k..4G G granularity flag: if set to 0, granularity in bytes, else 4k units. S flag: if set, segment is system segment 4 bit type field: code segment data segment task state segment, save area for register ldt descriptor GDT only, indicates segments points to LDT table segment register (segment selector) -> table of segment descriptors -> segment/memory mapping cached in non-programmable segment descriptor each time segment register loaded, segment descriptor also loaded -- provides faster access allows us avoid accessing GDT or LDT segment selector: 13 bit segment id, points to segment descriptor bit to indicate GDT or LDT 2 bit privilege level field (never mind) segment id maps to descriptor by e.g., if 2 and GDT, then base of GDT address, plus 2 * 8 1st segment of GDT is null, which means kernel accesses to 0 should cause an exception refer to figure 2-4: this is segmentation: how we turn logical address into virtual address. note: segment/page/offset theory always important ... no matter what architecture segmentation in linux theory: segmentation could allow 1 function/1 segment. This is a totally whacky theory. history of unix os: segments used for text data stack possible o.s. segment for per process os stack all the functions go in text (early unix on pdp-11) if you have enough bits ... segmentation can have large segments but swapping LARGE segments in/out is not *efficient*, thus swapping pages is favored. however, *segmentation* might be used for privilege/access protection e.g., why not make stack segment not-writable to prevent buffer-overflow attacks all processes use same logical addresses kernel uses gdt ... in fact gdt has enough segments that we could probably just use it and get by. kernel does not use LDTs, although user process can allocate with modify_ldt() segments used by linux kernel code segment can be read and executed all memory kernel data segment can be read and written all memory user code segment user data segment task state segment for each cpu array of registers basically default ldt table four segments out of GDT for APM support --------------------------------------------------------- paging in hw functions: translate linear address into physical address check permissions pages are contiquous fixed-size checks (segments are variable sized) 4k ... easy to write to/from primary and secondary storage ram and swap space both partitioned into page frames 0 4 k chunk 1 2 .. n 8086 paging address architecture, is 3 tuple directory + table + offset 10 10 12 (4k) = 32 bits there are 2 address translation steps: 1. via directory table, which is table of tables 2. via table which is a set of pages 3. offset is within page itself why? because this is process context, and if we setup a process a priori to have a PTE (page table entry) it would be very costly in memory overhead e.g., say 12 bit offset (page), and then we need 2**20 of them to cover all 2**32 address space. with 10 entries apiece, each directory/table can have 1k entries. hardware protection in paging read/write or readonly 3-level paging linux adopted this because it wants to work with 64-bit architectures. problem here is LOTS of memory, 3-level hierarchy insufficient. e.g., choose 16k page, therefore 14 bits per page. therefore offset is 14 bits, therefore rest of address is 50 bits. 25 bits in 2 tables, means 2**25 possible, which is 32 million entries. too big ... alpha is as follows: page frames: 2**13, 8k only use least significant 43 bits: leaves 30 bits. 3 Levels of 10 bits apiece, 10 bits per table, 1k entries. physical address extension paging mechanism (PAE) 386 to P1 havve 32 bit addresses but linear address limitations means kernel can only talk to 1 G, even though 4 G possible P2 on allow 32 bit linear addresses to be translated into 36 bits == 64G new level of hierarchy introduced problem here is that programmable address space still only 2**32 hardware cache cpu registers always faster than memory therefore we need to cache instructions/data chunks principle of locality important here code may have a loop ... branches may be costly Intel introduced line, set of bytes transferred from DRAM to fast cache memory fully associative cache, means that line from DRAM can be put anywhere in it N-way associative ... line can only go in certain places reading is less tricky than writing write-through, write back to ram write-back, wait and update on interesting cache event, for example cache miss or flush multi-cpu, one cache per cpu cache snooping - if cpu mods its cache, may need to modify other cpu caches too in linux caching is enabled for all page frames, and write-back strategy is always used translation lookaside buffers exist to speed up linear address translation very inefficient to always map pages from virtual to physical tlbs basically cache for this in multi-cpu system, each cpu has its own TLB paging in linux ---------------- linux uses 4-level hierarchy so that it can be portable to 64-bit architectures: hw cr3 points to global register global middle table offset desirable to: 1. assign different physical address space to each process, therefore minimize possibility of error 2. distinguish pages (data) from page frames (addresses) thus we can load page N into frames X, Y, Z, as needed. process has its own global directgory and own set of page tables at process switch, linux saves cr3 in process table, loads new cr3 on pentium, only 2-level hw hierarchy, linux sets middle table to 0. however it is used with PAE mechanism.