In the remainder of this article, I will attempt to dispell as much of the misinformation presented in the original Video RAM Caching article. In the remaining four sections, I will cover the major areas in which the original article is, for lack of a better, just plain wrong. Section 2 will calculate more accurate values for memory, cache and AGP bandwidth to the CPU. Section 3 will explain the relationship of the CPU's L2 cache to the remainder of the system. Section 4 will explain the actual significance of the A0000h-AFFFFh memory region, and Section 5 will conclude the paper.
Using the same calculations, the raw bandwidth of main memory on a 100MHz bus based system (i.e., based on the Intel 440BX chipset) would be 800 MB/s. On a Celeron based CPU, the L2 cache operates at the clock speed of the CPU. On a 500MHz Celeron CPU, the L2 cache bandwidth to the CPU is, theoretially, 4 GB/s. These values match the values calculated by Wong.
However, this neglects to consider one other important bandwidth calculation. What about the speed of the connection from the video memory to the rest of the system? On a 440BX chipset system this will most likely be via the AGP bus. According to Intel's AGP Technology Overview[2], the AGP bus as implemented by the 440BX chipset has a peak bandwidth of 528 MB/s. For comparison, the ISA bus is 16-bits wide and operates at 1/4th the speed of the PCI bus, and has a peak bandwidth of about 16 MB/s.
This is quite different from the architecture of the Pentium, 486, or 386 CPUs. For these processors the L2 cache was controlled entirely by the motherboard chipset, and was, for the most part, transparent to the CPU. This distance from the CPU limited the bandwidth of the cache-to-CPU connection. In the case of the Pentium, the bandwidth was limited to about 528 MB/s. This is a far cry from the 4 GB/s available on today's Celeron processors. The closeness of the cache forces the chipset to view the CPU and the L2 cache as a single entity. This can be seen in the block diagram at [2]. There are two important issues here that are absent in Wong's article.
The whole basis of Wong's article is that by enabling "video RAM cacheable" in the BIOS, that somehow transfers from main memory to video memory would go through the CPU cache and therefore operate at 4 GB/s. This entire theory is contrary to how the system actually functions.
What does this setting do? It dates back to the old days when the video card was on the ISA bus. At that time there were no AGP transfers and no PCI transfers. If data was to be moved from main memory to video memory it was done directly by the CPU. Moreover, most video cards of that time were "dumb" framebuffers. They did not draw lines, They didn't clear memory. They just displayed pixels on the monitor. All drawing was done by the CPU.
While the CPU was doing all of this drawing, it was often necessary to read back data that it had already written to video memory. This could be very, very time consuming. Remember that the bandwidth to video memory had a peak of about 16 MB/s. By caching the data in the L2 or the L1 cache, some graphics algorithms could be greatly sped up.
However, he attributes this to the size of the L2 cache and system bus bandwidth. In fact, the real reason is that this BIOS setting has nothing what so ever to do with an AGP based system. Unless the system has an ISA based video card, this BIOS setting is totally meaningless.