Valve Hardware Day 2006 - Multithreaded Edition
by Jarred Walton on November 7, 2006 6:00 AM EST- Posted in
- Trade Shows
Test Setup
Obviously valve is pretty excited about what can be done with additional processing power, and they have invested a lot of time and resources into building tools that will take advantage of the possibilities. However, Valve is a software developer as opposed to a hardware review site, and our impression is that most of their systems are typical of any business these days: they are purchased from Dell or some other large OEM, which means they are a bit more limited in terms of what kind of hardware is available. That's not to say that Valve hasn't tested AMD hardware, because they have, but as soon as they reached the conclusion that Core 2 Duo/Core 2 Quad would be faster, they probably didn't bother doing a lot of additional testing. We of course are more interested in seeing what these new multiprocessor benchmarks can tell us about AMD and Intel hardware -- past, present, and future -- and we plan on utilizing these tests in future articles. As a brief introduction to these benchmark utilities, however, we thought it would be useful to run them on a few of our current platforms to see how they fare.
In the interest of time, we did not try to keep all of the tested platforms identical in terms of components. Limited testing did show that the processor is definitely the major bottleneck in both benchmarks, with a variance between benchmark runs of less than 5% on all platforms. Besides the processor, the only other area that seems to have any significant impact on benchmark performance is memory bandwidth and timings. We tested both benchmarks three times on each platform, then we threw out the high and low scores and took the remaining median score. In many instances, the first run of the particle simulation benchmark was slightly slower than the next two runs, which were usually equal in performance. The variability between benchmark runs of the map compilation test was less than 1%, so the results were very consistent.
Here are the details of the tested systems.
We did test all of the systems with the same graphics card configuration, just to be consistent, but it really made little to no difference. On the Athlon 64 configuration, for example, we got the same results using the integrated graphics as we got with the X1900. We also tested at different resolutions, and found once again that on the graphics cards we used resolution seemed to have no impact on the final score. 640x480 generated the same results as 1920x1200, even when enabling all of the eye candy at the high resolution and disabling everything at the low resolution. To be consistent, all of the benchmarking was done at the default 1024x768 0xAA/8xAF. We tried to stay consistent on the memory that we used -- either for DDR or DDR2 - though the Pentium D test system had issues and would not run the particle simulation benchmark. Finally, to give a quick look at performance scaling, we overclocked all of the tested systems by 20%.
For now we are merely providing a short look at what Valve has been working on and some preliminary benchmarks. We intend to use these benchmarks on some future articles as well where we will provide a look at additional system configurations. Note that performance differences of one or two points should not be taken as significant in the particle simulation test, as the granularity of the reported scores is relatively coarse.
Obviously valve is pretty excited about what can be done with additional processing power, and they have invested a lot of time and resources into building tools that will take advantage of the possibilities. However, Valve is a software developer as opposed to a hardware review site, and our impression is that most of their systems are typical of any business these days: they are purchased from Dell or some other large OEM, which means they are a bit more limited in terms of what kind of hardware is available. That's not to say that Valve hasn't tested AMD hardware, because they have, but as soon as they reached the conclusion that Core 2 Duo/Core 2 Quad would be faster, they probably didn't bother doing a lot of additional testing. We of course are more interested in seeing what these new multiprocessor benchmarks can tell us about AMD and Intel hardware -- past, present, and future -- and we plan on utilizing these tests in future articles. As a brief introduction to these benchmark utilities, however, we thought it would be useful to run them on a few of our current platforms to see how they fare.
In the interest of time, we did not try to keep all of the tested platforms identical in terms of components. Limited testing did show that the processor is definitely the major bottleneck in both benchmarks, with a variance between benchmark runs of less than 5% on all platforms. Besides the processor, the only other area that seems to have any significant impact on benchmark performance is memory bandwidth and timings. We tested both benchmarks three times on each platform, then we threw out the high and low scores and took the remaining median score. In many instances, the first run of the particle simulation benchmark was slightly slower than the next two runs, which were usually equal in performance. The variability between benchmark runs of the map compilation test was less than 1%, so the results were very consistent.
Here are the details of the tested systems.
Athlon 64 3200+ 939 | |
CPU | Athlon 64 3200+ (939) - 2.0GHz 512K OC 3200+ @ 10x240 HTT = 2.40GHz |
Motherboard | ASUS A8N-VM CSM - nForce 6150 |
Memory | 2x1GB OCZ OCZ5001024EBPE - DDR-400 2-3-2-7 1T OC DDR-480 3-3-2-7 1T |
GPU | X1900 XT |
HDD | Seagate SATA3.0Gbps 7200.9 250GB 8MB cache 7200 RPM |
Athlon X2 3800+ 939 | |
CPU | Athlon X2 3800+ (939) - 2.0GHz 2x512K OC 3800+ @ 10x240 HTT = 2.40GHz |
Motherboard | ASUS A8R32-MVP - ATI Xpress 3200 |
Memory | 2x1GB OCZ OCZ5001024EBPE - DDR-400 2-3-2-7 1T OC DDR-480 3-3-2-7 1T |
GPU | X1900 XT |
HDD | Western Digital SATA3.0Gbps SE16 WD2500KS 250GB 16MB cache 7200 RPM |
Athlon X2 3800+ AM2 | |
CPU | Athlon X2 3800+ (AM2) - 2.0GHz 2x512K OC 3800+ @ 10x240 HTT = 2.40GHz |
Motherboard | Foxconn C51XEM2AA - nForce 590 SLI |
Memory | 2x1GB Corsair PC2-8500C5 - DDR2-800 4-4-4-12 OC DDR2-960 4-4-4-12 |
GPU | X1900 XT |
HDD | Western Digital SATA3.0Gbps SE16 WD2500KS 250GB 16MB cache 7200 RPM |
Core 2 Duo E6700 NF570 | |
CPU | Core 2 Duo E6700 - 2.67GHz 4096K OC E6700 @ 10x320 FSB = 3.20GHz |
Motherboard | ASUS P5NSLI - nForce 570 SLI for Intel |
Memory | 2x1GB Corsair PC2-8500C5 - DDR2-800 4-4-4-12 OC DDR2-960 4-4-4-12 |
GPU | X1900 XT |
HDD | Western Digital Raptor 150GB 16MB 10000 RPM |
Core 2 Quad QX6700 975X | |
CPU | Core 2 Quad QX6700 - 2.67GHz 2 x 4096K OC QX6700 @ 10x320 FSB = 3.20GHz |
Motherboard | ASUS P5W DH Deluxe - 975X |
Memory | 2x1GB Corsair PC2-8500C5 - DDR2-800 4-4-4-12 OC DDR2-960 4-4-4-12 |
GPU | X1900 XT |
HDD | 2 x Western Digital Raptor 150GB in RAID 0 |
Pentium D 920 945P | |
CPU | Pentium D 920 - 2.8GHz 2 x 2048K OC 920 @ 14x240 HTT = 3.36GHz |
Motherboard | ASUS P5LD2 Deluxe - 945P |
Memory | 2x1GB Corsair PC2-8500C5 - DDR2-667 4-4-4-12 OC DDR2-800 4-4-4-12 |
GPU | X1900 XT |
HDD | Western Digital SATA3.0Gbps SE16 WD2500KS 250GB 16MB cache 7200 RPM |
We did test all of the systems with the same graphics card configuration, just to be consistent, but it really made little to no difference. On the Athlon 64 configuration, for example, we got the same results using the integrated graphics as we got with the X1900. We also tested at different resolutions, and found once again that on the graphics cards we used resolution seemed to have no impact on the final score. 640x480 generated the same results as 1920x1200, even when enabling all of the eye candy at the high resolution and disabling everything at the low resolution. To be consistent, all of the benchmarking was done at the default 1024x768 0xAA/8xAF. We tried to stay consistent on the memory that we used -- either for DDR or DDR2 - though the Pentium D test system had issues and would not run the particle simulation benchmark. Finally, to give a quick look at performance scaling, we overclocked all of the tested systems by 20%.
For now we are merely providing a short look at what Valve has been working on and some preliminary benchmarks. We intend to use these benchmarks on some future articles as well where we will provide a look at additional system configurations. Note that performance differences of one or two points should not be taken as significant in the particle simulation test, as the granularity of the reported scores is relatively coarse.
55 Comments
View All Comments
Nighteye2 - Wednesday, November 8, 2006 - link
Ok, so that's how Valve will implement multi-threading. But what about other companies, like Epic? How does the latest Unreal Engine multi-thread?Justin Case - Wednesday, November 8, 2006 - link
Why aren't any high-end AMD CPUs tested? You're testing 2GHz AMD CPUs against 2.6+ GHz Intel CPUs. Doesn't Anandtech have access to faster AMD chips? I know the point of the article is to compare single- and multi-core CPUs, but it seems a bit odd that all the Intel CPUs are top-of-the-line while all AMD CPUs are low end.JarredWalton - Wednesday, November 8, 2006 - link
AnandTech? Yes. Jarred? Not right now. I have a 5000+ AM2, but you can see that performance scaling doesn't change the situation. 1MB AMD chips do perform better than 512K versions, almost equaling a full CPU bin - 2.2GHz Opteron on 939 was nearly equal to the 2.4GHz 3800+ (both OC'ed). A 2.8 GHz FX-62 still isn't going to equal any of the upper Core 2 Duo chips.archcommus - Tuesday, November 7, 2006 - link
It must be a really great feeling for Valve knowing they have the capacity and capability to deliver this new engine to EVERY customer and player of their games as soon as it's ready. What a massive and ugly patch that would be for virtually any other developer.Don't really see how you could hate on Steam nowadays considering things like that. It's really powerful and works really well.
Zanfib - Tuesday, November 7, 2006 - link
While I design software (so not so much programming as GUI design and whatnot), I can remember my University courses dealing with threading, and all the pain threading can bring.I predicted (though I'm sure many could say this and I have no public proof) that Valve would be one of the first to do such work, they are a very forward thinking company with large resources (like Google--they want to work on ANYthing, they can...), a great deal of experience and, (as noted in the article) the content delivery system to support it all.
Great article about a great subject, goes a long way to putting to rest some of the fears myself and others have about just how well multi-core chips will be used (with the exception of Cell, but after reading a lot about Cell's hardware I think it will always be an insanely difficult chip to code for).
Bonesdad - Tuesday, November 7, 2006 - link
mmmmmmmmm, chicken and mashed potatoes....Aquila76 - Tuesday, November 7, 2006 - link
Jarred, I wanted to thank you for explaining in terms simple enough for my extremely non-technical wife to understand why I just bought a dual-core CPU! That was a great progression on it as well, going through the various multi-threading techniques. I am saving that for future reference.archcommus - Tuesday, November 7, 2006 - link
Another excellent article, I am extremely pleased with the depth your articles provide, and somehow, every time I come up with questions while reading, you always seem to answer exactly what I was thinking! It's great to see you can write on a technical level but still think like a common reader so you know how to appeal to them.With regards to Valve, well, I knew they were the best since Half-Life 1 and it still appears to be so. I remember back in the days when we weren't even sure if Half-Life 2 was being developed. Fast forward a few years and Valve is once again revolutionizing the industry. I'm glad HL2 was so popular as to give them the monetary resources to do this kind of development.
Right now I'm still sitting on a single core system with XP Pro and have lots of questions bustling in my head. What will be the sweet spot for Episode 2? Will a quad core really offer substantially better features than a dual core, or a dual core over a single core? Will Episode 2 be fully DX10, and will we need DX10 compliant hardware and Vista by its release? Will the rollout of the multithreaded Source engine affect the performance I already see in HL2 and Episode 1? Will Valve actually end up distributing different versions of the game based on your hardware? I thought that would not be necessary due to the fact that their engine is specifically designed to work for ANY number of cores, so that takes care of that automatically. Will having one core versus four make big graphical differences or only differences in AI and physics?
Like you said yourself, more questions than answers at this point!
archcommus - Tuesday, November 7, 2006 - link
One last question I forgot to put in. Say it was somehow possible to build a 10 or 15 GHz single core CPU with reasonable heat output. Would this be better than the multi-core direction we are moving towards today? In other words, are we only moving to mult-core because we CAN'T increase clock speeds further, or is this the preferred direction even if we could.saratoga - Tuesday, November 7, 2006 - link
You got it.A higher clock speed processor would be better, assuming performance scaled well enough anyway. Parallel hardware is less general then serial hardware at increasing performance because it requires parallelism to be present in the workload. If the work is highly serial, then adding parallelism to the hardware does nothing at all. Conversely, even if the workload is highly parallel, doubling serial performance still doubles performance. Doubleing the width of a unit could double the performance of that unit for certain workloads, while doing nothing at all for others. In general, if you can accelerate the entire system equally, doubling serial performance will always double program speed, regardless of the program.
Thats the theory anyway. Practice says you can only make certain parts faster. So you might get away with doubling clock speed, but probably not halving memory latency, so your serial performance doesn't scale like you'd hope. Not to mention increasing serial performance is extremely expensive compared to parallel performance. But if it were possible, no one would ever bother with parallelism. Its a huge pain in the ass from a software perspective, and its becoming big now mostly because we're starting to run out of tricks to increase serial performance.