Video Memory Caching

Video Memory Caching Myths Dispelled
By Ian Romanick, idr@cs.pdx.edu
11-January-2000

Section 1: Introduction

Achieving ultimate system performance has always been a goal of bleeding edge computer owners. Like the muscle car guys of earlier decades, every possible minor way to tweak system performance is investigated. When a new technique is discovered, invariably, the discoverer writes an article to post to the Internet. The article by Adrian Wong on Video RAM Caching is one such article. Like much of the information available on the Internet, this article shows a fundamentally flawed understanding of the technology, terminology, and methodology at hand.

In the remainder of this article, I will attempt to dispell as much of the misinformation presented in the original Video RAM Caching article. In the remaining four sections, I will cover the major areas in which the original article is, for lack of a better, just plain wrong. Section 2 will calculate more accurate values for memory, cache and AGP bandwidth to the CPU. Section 3 will explain the relationship of the CPU's L2 cache to the remainder of the system. Section 4 will explain the actual significance of the A0000h-AFFFFh memory region, and Section 5 will conclude the paper.

Section 2: Bandwidth Calculations

Since video based applications, such as 3D games, are inherrently memory intensive, available memory bandwidth can be used to set a reasonable theoretical upper bound on actual performance. As pointed out in the Wong article, a typical Q4 1999 PC video card has 16MB to 32MB RAM on a 128-bit 150MHz bus. The actual figures vary, but 128-bits and 150MHz is a fairly representitive value. This provides a raw memory bandwidth on the order of 1.2 GB/s. This is compareable with the maximum texel read rate of 333 Mtex/s of, for example, the Voodoo3 3000[1].

Using the same calculations, the raw bandwidth of main memory on a 100MHz bus based system (i.e., based on the Intel 440BX chipset) would be 800 MB/s. On a Celeron based CPU, the L2 cache operates at the clock speed of the CPU. On a 500MHz Celeron CPU, the L2 cache bandwidth to the CPU is, theoretially, 4 GB/s. These values match the values calculated by Wong.

However, this neglects to consider one other important bandwidth calculation. What about the speed of the connection from the video memory to the rest of the system? On a 440BX chipset system this will most likely be via the AGP bus. According to Intel's AGP Technology Overview[2], the AGP bus as implemented by the 440BX chipset has a peak bandwidth of 528 MB/s. For comparison, the ISA bus is 16-bits wide and operates at 1/4th the speed of the PCI bus, and has a peak bandwidth of about 16 MB/s.

Section 3: L2 Cache and Multmaster Busses

In moderen systems, the L2 cache is tightly coupled with the CPU and is critical for optimal system performance. In the case of the Intel Celeron processor, the L2 cache is so tightly coupled with the CPU that it resides on the same piece of silicon. The net result is that, with the exception of cache coherency protocols, virtually nothing outside of the CPU can directly access or modify the L2 cache.

This is quite different from the architecture of the Pentium, 486, or 386 CPUs. For these processors the L2 cache was controlled entirely by the motherboard chipset, and was, for the most part, transparent to the CPU. This distance from the CPU limited the bandwidth of the cache-to-CPU connection. In the case of the Pentium, the bandwidth was limited to about 528 MB/s. This is a far cry from the 4 GB/s available on today's Celeron processors. The closeness of the cache forces the chipset to view the CPU and the L2 cache as a single entity. This can be seen in the block diagram at [2]. There are two important issues here that are absent in Wong's article.

Transfers from the CPU to video memory must pass through the system bus (800 MB/s) and the AGP bus (528 MB/s), and are therefore limited to the speed of the slowest bus.
Transfers from main memory directly to video memory (i.e., texture transfers) never pass through the CPU cache.

The whole basis of Wong's article is that by enabling "video RAM cacheable" in the BIOS, that somehow transfers from main memory to video memory would go through the CPU cache and therefore operate at 4 GB/s. This entire theory is contrary to how the system actually functions.

Section 4: ISA Bus Video Memory Region

By examining the system settings on a PC based on the 440BX chipset with a Riva TNT2 AGP based video card, it can been seen that video memory is mapped to the address range EA000000h-EBFFFFFFh. It should be obvious that this has absolutely nothing to do with the A0000h-AFFFFh range mentioned in the "video RAM cacheable" BIOS setting.

What does this setting do? It dates back to the old days when the video card was on the ISA bus. At that time there were no AGP transfers and no PCI transfers. If data was to be moved from main memory to video memory it was done directly by the CPU. Moreover, most video cards of that time were "dumb" framebuffers. They did not draw lines, They didn't clear memory. They just displayed pixels on the monitor. All drawing was done by the CPU.

While the CPU was doing all of this drawing, it was often necessary to read back data that it had already written to video memory. This could be very, very time consuming. Remember that the bandwidth to video memory had a peak of about 16 MB/s. By caching the data in the L2 or the L1 cache, some graphics algorithms could be greatly sped up.

Section 5: Conclusion

Wong presents the results of several benchmarks in his article. In each of these benchmarks the performance difference is less than 1%. In any sort of a reasonable performance test, a difference of 1% or less can easilly be attributed to statistical error. His tests and his conclusion are that this BIOS setting has, essentially, no real impact.

However, he attributes this to the size of the L2 cache and system bus bandwidth. In fact, the real reason is that this BIOS setting has nothing what so ever to do with an AGP based system. Unless the system has an ISA based video card, this BIOS setting is totally meaningless.

References

Voodoo3 3000 AGP Product Specification Sheet
Intel AGP Technology Overview
Video RAM Caching by Adrian Wong.