Intel Developer Forum Fall 2002 - Hyper-Threading & Memory Roadmap
by Anand Lal Shimpi on September 16, 2002 6:05 PM EST- Posted in
- Trade Shows
Hyper-Threading and Memory Bandwidth
The concept of Hyper-Threading brings up an interesting point and that is - does Hyper-Threading increase a CPU's dependency on memory bandwidth? The reasoning behind that statement is simple; if two threads are executing on the CPU concurrently then you've effectively halved the amount of cache that each individual thread has access to. Halving the cache means that each thread will end up requesting more data from memory, which increases the CPU's dependency on a high-bandwidth and highly efficient memory subsystem. With RDRAM offering very high bandwidth and much higher theoretical utilization than DDR SDRAM, could Hyper-Threading be RDRAM's killer app?
We posed this question to Pat Gelsinger and his response took an angle that we hadn't originally thought of. The premise of the increased memory bandwidth requirements argument is that the two threads that are executing don't exhibit much (if any) locality. Remember that data is pulled into the CPU's cache based on two principles - locality and temporality. The principle of locality states that if one block of data is requested, it's very likely that data located around it will be requested in the future; this results in the CPU not only pulling into cache the block of data that's requested but also what's around it in memory.
If the threads being executed do exhibit certain levels of locality then most of what they need will already be in cache, but what if they don't? Well it turns out that in most of the multitasking/multithreaded scenarios there is a good deal of locality of reference present, but let's say for the sake of argument that there's not. Another very popular situation occurs when you're running one small background application (e.g. virus scanning software) alongside a much larger application. The memory footprint for the background application is usually relatively tiny and the working set size small enough that it can fit entirely within the cache of the CPU; although this does take away some of the room in the cache for the larger program its usually a small enough amount that it doesn't matter or the larger program is an application that is going to main memory a lot anyways and thus performance doesn't change.
The final scenario is one where you have two very large applications running with next to no locality in their memory accesses, in which case dependency on a high bandwidth, highly efficient memory subsystem does go up with Hyper-Threading enabled. These situations are much more specialized and harder to find but it is something to keep in mind when investigating HT performance.
0 Comments
View All Comments