The Wayback Machine - https://web.archive.org/web/20030224080629/http://www.aceshardware.com:80/read.jsp?id=140
Latest News nourl


Reviews
Barton: 512 KB Athlon XP Reviewed
Granite Bay: Memory Technology Shootout
A Quick Look at the Fastest Apple PowerMac
3.06 GHz Pentium 4 and HyperThreading
More Reviews...
Technical
Scaling Server Performance
The Hitchhiker's Guide to the Mainframe
Ace's Guide to Memory Technology: Part 3
Volume Multi-Processor Systems: Part 3
More Technical Articles...
How-To Guides
K6-III+: Super-7 to the Limit
Overclocking Socket A Processors
K6-2+ Optimization and Performance Guide
Buying and Overclocking the Athlon
More How-To Guides...
Latest Discussions
So what should k9 be to counter flood of new P4 architectures?
To Xorbe. . .
Reviews comparing 200 GB drives?
Sandpile.org: Prescott Instructions to be called SSE3
intel 850E incompatible with ati9700 ?
Linux 2.5.32 = Improvement for Hyperthreading

The Quest for Better Performance, Part 2
By Johan De Gelas
Sunday, April 23, 2000 3:39 PM EDT

Performance: Chipset, FSB, and Memory

In our previous article, we hunted for performance bottlenecks and we have discovered quite a few interesting things. Those bottlenecks are the bandwidth and latency of the memory and the L2-cache. In this article we will try to solve these bottlenecks. One of our solutions is faster memory, therefore, we take a closer look at the performance of different types of memory.

Is Rambus really that bad? What about DDR SDRAM? VC SDRAM? How will future technology affect the performance of new CPUs like Thunderbird and Willamette? Is a 200 MHz, 1.6 GB/s FSB useless without fast memory? Let us find out!

Latency and Bandwidth

The latency, expressed in clock cycles, is the time it takes to get the data send out, that has been requested. It takes some time before the memory address is decoded and the data is ready to be send back. Bandwidth is, of course, the number of megabytes that can be send per second through the memory bus. Contrary to popular belief, latency and bandwidth are very closely related. If latency is very high for the first word (4 bytes), the bandwidth is lower, especially with random memory accesses.

For example, the peak bandwidth of RAMBUS PC800 is 1600 MB/s. But with random memory accesses the first 4 bytes arrive after 11 cycles, and typically a 32 byte transfer (to transmit a 32 byte cache line of data to the CPU) takes 11-1-1-1 cycles or 14 cycles. If the FSB runs at 133 MHz, the bandwidth for random memory accesses to the CPU is 32 bytes x 133 MHz / 14 = 304 MB/s.

SDRAM PC133 will do better in those circumstances (random accesses). It takes 7-1-1-1 cycles to transfer a 32 bit line to the CPU's cache, so the CPU will receive (32 bytes x 133 MHz) per 10 clockcycles = 428 MB/s. If the memory accesses are more sequential however, than the initial latency will not be so important. For example if we can read 64 bytes sequential than we have 11-1-1-1 for the first 32 bytes but only 4 cycles (1-1-1-1, simplified) for the next 32 bytes. So the bandwidth will come closer to the peak: 64 bytes x 133 MHz/ 18 cycles= 473 MB/s. Bursts of memory traffic with sequential accesses will lower the influence of the initial latency and the average bandwidth to the CPU will rise.

Why RAMBUS Fails!

Astute readers have already figured out why systems with RAMBUS fail to show better performance than systems with PC133 SDRAM. First of all, the FSB of the PIII is limited to 133 MHz, limiting the bandwidth to the CPU to 1066 MB/s. The only way that the higher bandwidth of RAMBUS PC800 can used is though AGP texturing. As we have shown in previous articles (here and here ), AGP texturing is not very popular with game developers because it is, even with AGP 4x, incredibly slow! In other words, the bandwidth of PC800 RAMbus is limited to 1066 MB/s, in theory as much as PC133 SDRAM.

Secondly, lower latencies will always improve performance, in sequential (less important) and in random memory accesses (very important), and RAMBUS initial latency is higher, seen from the CPU, than SDRAMs. These latencies can become even higher than the numbers we quoted (11 cycles versus 7 cycles) when RAMBUS powers down because of heat issues.

Is High Bandwidth Useless?

I have been emphasizing the importance of low latency so much, that you might get the impression that memory bandwidth is useless. As we have pointed out in our previous article, the only big dataflow that really matters is the flow from the memory via the chipset to the CPU and vice versa. Again, bandwidth and latency are very closely related. Even in non-bandwidth critical applications, burst of memory activity occur. In other words, even non-bandwidth critical applications (like most applications) have small pieces of code, with a lot of cache missing, memory hungry instructions. If those instructions depend on each other results and the memory bandwidth can not keep up, it takes longer before the data arrives.

Indeed, low bandwidth will increase the latency that the CPU sees! So the last thing I claim is that high bandwidth is useless. But what we do claim is that high bandwidth memory is useless if the front side bus can not cope with it. And now we got the benchmarks to prove it! Let us see how much the front side bus and the memory bandwidth and latency affects performance. You will be amazed, I can tell you that.

Memory Affects Performance!

I had two memory types to my disposal, SDRAM and VC SDRAM. VC SDRAM is supposed to improve the effective bandwidth of SDRAM. I included the VC SDRAM figures because it is interesting to see how the different types of memory react. But we will be focusing on the relation between the memory and the front side bus.

In the table under the graph you see three numbers above each result.

The first number indicates the CAS latency
The Second number indicates the speed of the memory stick.
The first number indicates the front side bus speed, the speed between the chipset and the CPU.

Well, VC SDRAM offers between 4 and 7% more bandwidth, nothing to write home about. You can imagine that real world benchmarks will show even smaller benefits if a bandwidth specific benchmark like stream shows so little improvement.

But much more interesting is the amount of bandwidth offered by the different FSB� The 224 Mhz FSB setting with the memory clocked at 112 MHz (CPU at 112x8 = 896 MHz) offers slightly more bandwidth than the 180 MHz FSB setting with the rams clocked at 120 MHz (CPU at 90x10 = 900 MHz). That is weird, as you would have expected that the 120 MHz SDRAM would always win from the 112 MHz SDRAM. Seems like the speed of the memory stick is important, but that the FSB plays a less important, but significant role in determining the bandwidth of the memory.

Let us see another Stream benchmark.

VC SDRAM doesn't shine, the differences are small. But interesting is that VC SDRAM confirms what we have seen in the previous benchmark: a high frontside bus might help in some cases. SDRAM seems to prefer the faster memory speed.

Right now, we only have a few indications that the speed of the FSB might be important. Unfortunately it is not so easy to study this as I could only play with FSB speeds ranging from 180 to 224 MHz (90 MHZ DDR-112 MHz DDR). I would have love to set the FSB of the Athlon to 133 MHz and increase it to 200 MHz to see what happened, but that is not possible.

Nevertheless, let us take a look how Quake 3 reacts with different FSB and memory speeds.

3D Gaming Performance and More

All Content is Copyright (C) 1998-2003 Ace's Hardware. All Rights Reserved.
154 ms