Intel Developer Forum Spring 2004 - Day 1: Wider and Faster
by Derek Wilson on February 18, 2004 10:41 AM EST- Posted in
- Trade Shows
PCI Express Product Launches
NVIDIA and ATI have officially begun the switch to PCI Express by launching their initial products. NVIDIA implements PCI Express via a bridge chip while ATI's solutions are native. Our ATI Roadmap article shows the lineup we can expect to see come out of ATI this year. NVIDIA launched four PCI Expressed solutions dubbed Geforce PCX:GeForce PCX 5950
GeForce PCX 5750
GeForce PCX 5300
GeForce PCX 4300
The PCX 4300 is an iteration of NVIDIA's MX value line, the 5300 and 5750 are bridged 5200 and 5700 cards respectively.
Apparently ATI demonstrated a "next generation" GPU at a presentation they gave on Tuesday showing off 2 to 3x performance gains over their current generation cards running DX9. Of course, it was an unreleased game on unreleased hardware, we didn't get to run the benchmarks ourselves, and apparently the performance of the next gen part was just at acceptable levels.
ATI is really pushing the idea that native PCI Express gives them an advantage in things like running multiple HD streams at higher frame rates than AGP or a bridged solution allow, but again, we need hardware to test in order to verify these claims.
We will have longer meetings with both ATI and NVIDIA over the next couple days, and we will be sure to report on what learn.
17 Comments
View All Comments
TrogdorJW - Thursday, February 19, 2004 - link
If anyone is interested in reading more speculation on Prescott and how it gets 64-bits, I posted some of my *theories* over at the FiringSquad forums. Here's the link for the complete discussion:http://forums.firingsquad.com/firingsquad/board/me...
The important part is as follows:
----------------------------------
The big question now is, how well Prescott-64 perform? I think that they can get the heat under control. (More speculation.) However, maximizing 64-bit performance might be a bit more difficult. Look at AMD with the stuff they've licensed from Intel. Intel still beats them in MMX, SSE, and now SSE2 performance (although they are getting closer with each new processor release).
Some other interesting things about the news: Intel is going to clock the ALUs (Arithmetic Logic Units) at core speed when running 64-bit code, apparently. Actually, they say 7 GHz in 32-bit mode and 4 GHz in 64-bit mode. That's a little odd, since the current ALUs run at twice the core speed in 32-bit mode, so 7 GHz would be from a 3.5 GHz processor. Why they would run at 4 GHz and not 3.5 GHz I couldn't say. Maybe because they can?
How will that affect performance? It depends on how the 64-bit extensions were added. If they use the same setup as the regular P4 core, with the only difference being that they added registers and made them 64-bits wide, then it would likely hurt performance relative to 32-bit mode. However, it is *possible* that the 64-bit was added on as a completely separate module. If this is the case, they might have separate 64-bit ALUs/AGUs. In other words, the current NetBurst design has 7 functional units: Two simple ALUs that run at 2X core speed, one complex ALU running at core speed, an FPU/SSE Move/Store, a full FPU/SSE that handles all of those operations, and two AGUs (Address Generation Units). The 64-bit extensions in Prescott/Nocona/Potomac (called "Clackamas Technology") could have their own AGUs and ALUs.
That would make sense to me, since as I mentioned in my earlier speculation, the core is currently about 73 million transistors compared to the Northwood's 29 million. Northwood has 7 functional units with 20 pipeline stages, giving about 205,000 transistors per stage per unit (29 million / (20 * 7)). If the Prescott design simply extended the NetBurst architecture to 64-bits and 31 stages, it would be around 335,000 transistors per stage per unit (72 million / (31 * 7)). On the other hand, if the 64-bit extensions are added in a separate module with their own AGUs and ALUs, the Prescott would now have 11 functional units. That would give 214,000 transistors per stage per unit (73 million / (31 * 11)). An even more radical approach might be to have three 64-bit AGUs and ALUs. Then you would only have about 178,000 transistors per stage per unit. (That's a little more hard to believe, but since Intel is being forced to adopt AMD's instruction set, they might want to adopt the architecture for performance reasons.) Note: These are all very rough estimates. FPUs generally have more pipeline stages, and there are lots of other factors to consider, like the L1 cache and trace cache. This is just a baseline estimate.
As I stated earlier, increasing the number of transistors by such a large amount without adding more functional units would make the Prescott design scale worse than the Northwood design. Why would Intel do that!? Going to 31 stages would have been done to decrease the average number of transistors per stage, and they would likely aim to be at worst about the same as the Northwood. I certainly don't know for sure what was done, but various rumors and the fact that the 32-bit and 64-bit ALUs run at different speeds make me wonder. I suppose we'll know more in about two or three months, if not before then.
TrogdorJW - Thursday, February 19, 2004 - link
PC3200 is 3.2 GB/s single channel, and dual-channel it is 6.4 GB/s. XDR single-channel is 6.4 GB/s, so in a dual-channel setup (which is very likely, since almost all Rambus implementations in the past were dual-channel other than i820 - and we all know what a fiasco that was!) XDR will be 12.8 GB/s.It is important to note that DDR is normally a 64-bit bus, where RDRAM/XDR are apparently a 16-bit bus. Running 64 traces over a motherboard at high clock speeds is difficult at best, but if you cut that to 16 traces, it is not as hard. That's what Rambus was all about initially. Now, DDR is running 200 MHz (400 effective) with 128 traces in dual-channel operation. XDR is countering by running 16 traces at 400 MHz (3200 effective).
I find it interesting that the clock speed of XDR is really 400 MHz externally, but then internally they send eight bits per clock. From what was said in the article, I guess they first multiply the clock by four, and then they more or less use DDR tactics where you send a bit on the rising and falling clock. The end result, though, is the same. DDR2 does something similar, I believe. "1 GHz" DDR2 is really running at 250 MHz, with four bits per clock. So they double the clock and then send data on the rising and falling clock signal.
In order to match XDR, DDR2 would have to run at 200 MHz and an effective 800 MHz. We're seeing that on graphics cards, but it looks like that is still a ways off for motherboards. The latency question is still not really being answered by Rambus. "Low latency" at 3.2 GHz effective speeds could mean anything. I have seen that DDR2 is only offering CAS Latencies of 3, 4, and 5. I wonder what the equivalent XDR latency is - probably something like 6, 8, and 10.
If/when retail boards are released using XDR, it could be an exciting matchup. Prescott at 4 GHz could make very good use of added memory bandwidth, I bet. Integrated graphics with a 12.8 GB/s memory subsystem might actually not suck that hard! :)
Malladine - Thursday, February 19, 2004 - link
Oops...actually wanted to ask #13 if that means that PC3200 is 1.6gb/s single channel?Malladine - Thursday, February 19, 2004 - link
bhtooefr - Thursday, February 19, 2004 - link
#3, he said dual channel. Single channel IS 3.2GB/s, but dual channel is 6.4. I was going to point out that DC-DDR was the same speed as XDR.(my own comment) Remember when Northwood came out, and it didn't have HyperThreading enabled, but later released enabled it? Well, I wouldn't be surprised if the P4-F or P4-G is a Prescott-64.
KalTorak - Thursday, February 19, 2004 - link
The processor data bus has been 64 bits wide since the original Pentium processor, as I recall.AgaBooga - Thursday, February 19, 2004 - link
I predicted by reading Anand's articles months ago that the 64bit feature would be a lot like the Hyper-threading not enabled in the Williamette cores. I'm expecting something similar to happen this time around. I think Intel's whole timing thing to an extent is true, but had the 64bit helped or not decreased performance by a lot, they might have released it enabled.Pumpkinierre - Wednesday, February 18, 2004 - link
Even if you do enable 64 bit functionality in Prescott, wont you need a mobo or at least a BIOS upgrade to handle it. You probably dont need the memory size extension on the address bus but I dont know the size of the data bus on the prescott. If it is only 32 bit wide then it would need to carry out two fetches for full 64bit functionality (plus internal 64bit manipulation) but this would require a change to the microcode in the BIOS. Unless this is already present in prescott BIOS upgrades for mobos(i875/865) then you may have difficulties even if you 'switch on' the X86-64 commands. I suspect they are not going to enable it til Sckt 755.Jeff7181 - Wednesday, February 18, 2004 - link
One thing I wonder about is how flexible x86-64 is. Could it go through a revision that drops support for 32-bit instructions to enhance 64-bit performance when 64-bit software is the only software you can buy?KalTorak - Wednesday, February 18, 2004 - link
And given the iffy bandwidth available at Moscone West, I think Derek's doing pretty well to get these reports in :)The XDR stuff _was_ pretty cool; I didn't realize there was a clock source with low enough jitter to make that thing work.