The improvement in access latency to disks has significantly lagged the rapid technical advance of other computing resources, such as CPU speed, memory bandwidth, and network bandwidth. The performance bottleneck of the “memory wall” has been replaced by the “disk wall”: this is a serious bottleneck for many data-intensive applications. One barrier to speeding-up data access from disks is the limited ability of operating systems to exploit sequential locality. For the same amount of data, sequential disk access is several orders of magnitude faster than random access, so there is a large potential for performance improvement.
We have designed and built some basic operating system infrastructure called DiskSeen, which puts the disk layout information into the OS map. With DiskSeen, we are able to exploit dual localities (DULO): the temporal locality of the workload execution patterns, and the sequential locality of the disks. This talk will present two new buffer-management techniques, DULO-Caching and DULO-Prefetching. DULO-Caching effectively holds frequently used but randomly accessed data in the buffer cache to avoid slow disk access, but replaces sequentially but not very frequently accessed data in a timely fashion, to take advantage of fast sequential disk access. DULO-Prefetching adaptively preloads into the buffer cache sequentially-stored disk data blocks that belong to multiple files; this significantly improves the prefetching efficiency.
We have implemented DiskSeen with DULO-Caching and DULO-Prefetching in version 2.6.11 of the Linux Kernel, and have evaluated their performance on various data-intensive workloads. We show their effectiveness and low overhead in a practical system environment.
Xiaodong Zhang is the Robert M. Critchfield Professor in Engineering, and Chairman of the Department of Computer Science and Engineering at the Ohio State University.
His research interests cover a wide spectrum in the areas of high performance and distributed systems. Several technical innovations and research results from his team have been adopted or are being developed in commercial products and open source systems with direct impact on some core computing operations. These include the permutation memory interleaving technique first used in the Sun MicroSystems’ UltraSPARC IIIi processor and then in Sun’s dual-core Gemini Processor, the token-thrashing protection mechanism and the Clock-Pro page-replacement algorithm for memory management in the Linux Kernel and NetBSD.
Xiaodong Zhang was the Director of Advanced Computational Research Program at the National Science Foundation, 2001-2004. He is the associate Editor-in-Chief of IEEE Transactions on Parallel and Distributed Systems, and is also serving on the Editorial Boards of IEEE Transactions on Computers, IEEE Micro, and Journal of Parallel and Distributed Computing.
He received his Ph.D. in Computer Science from University of Colorado at Boulder.
Fei Xie