Intel Developer Forum Spring 2004 - Day 1: Wider and Faster
by Derek Wilson on February 18, 2004 10:41 AM EST- Posted in
- Trade Shows
Intel x86 Extensions?
Intel has extended the x86 instruction set in the past, and they could have done so once again when 4GBs of memory just wasn't enough for desktop users. AMD may have jumped the gun slightly on 64 bit computing as most of us don't even get to see beyond 2GBs of RAM in our systems today. Of course, by jumping the gun they were able to add some performance enhancing characteristics to their current line of CPUs that could potentially make them more attractive to customers than Intel's CPUs.Unless Intel added even more impressive architecture changes than AMD, out pacing what is already out there, there would be little reason to buy an Intel processor. Another very big advantage that comes with AMD's preemptive push toward 64 bit computing is that by the time Intel could bring out its own extensions, operating system and software support will already be there for x86-64. We've seen first hand how long it has taken Microsoft to bring AMD's extensions to Windows XP, and there is no reason to believe that MS would be quicker to support Intel's enhancements (branching an OS, rewriting parts of it, and putting it all together with support and documentation is not a quick and easy task).
All signs point to the fact that it is very unlikely Intel could expect to release separate extensions to x86 that are not compatible with AMD's extensions. Which brings us to the last and final option Intel had for 64 bit and x86.
17 Comments
View All Comments
TrogdorJW - Thursday, February 19, 2004 - link
If anyone is interested in reading more speculation on Prescott and how it gets 64-bits, I posted some of my *theories* over at the FiringSquad forums. Here's the link for the complete discussion:http://forums.firingsquad.com/firingsquad/board/me...
The important part is as follows:
----------------------------------
The big question now is, how well Prescott-64 perform? I think that they can get the heat under control. (More speculation.) However, maximizing 64-bit performance might be a bit more difficult. Look at AMD with the stuff they've licensed from Intel. Intel still beats them in MMX, SSE, and now SSE2 performance (although they are getting closer with each new processor release).
Some other interesting things about the news: Intel is going to clock the ALUs (Arithmetic Logic Units) at core speed when running 64-bit code, apparently. Actually, they say 7 GHz in 32-bit mode and 4 GHz in 64-bit mode. That's a little odd, since the current ALUs run at twice the core speed in 32-bit mode, so 7 GHz would be from a 3.5 GHz processor. Why they would run at 4 GHz and not 3.5 GHz I couldn't say. Maybe because they can?
How will that affect performance? It depends on how the 64-bit extensions were added. If they use the same setup as the regular P4 core, with the only difference being that they added registers and made them 64-bits wide, then it would likely hurt performance relative to 32-bit mode. However, it is *possible* that the 64-bit was added on as a completely separate module. If this is the case, they might have separate 64-bit ALUs/AGUs. In other words, the current NetBurst design has 7 functional units: Two simple ALUs that run at 2X core speed, one complex ALU running at core speed, an FPU/SSE Move/Store, a full FPU/SSE that handles all of those operations, and two AGUs (Address Generation Units). The 64-bit extensions in Prescott/Nocona/Potomac (called "Clackamas Technology") could have their own AGUs and ALUs.
That would make sense to me, since as I mentioned in my earlier speculation, the core is currently about 73 million transistors compared to the Northwood's 29 million. Northwood has 7 functional units with 20 pipeline stages, giving about 205,000 transistors per stage per unit (29 million / (20 * 7)). If the Prescott design simply extended the NetBurst architecture to 64-bits and 31 stages, it would be around 335,000 transistors per stage per unit (72 million / (31 * 7)). On the other hand, if the 64-bit extensions are added in a separate module with their own AGUs and ALUs, the Prescott would now have 11 functional units. That would give 214,000 transistors per stage per unit (73 million / (31 * 11)). An even more radical approach might be to have three 64-bit AGUs and ALUs. Then you would only have about 178,000 transistors per stage per unit. (That's a little more hard to believe, but since Intel is being forced to adopt AMD's instruction set, they might want to adopt the architecture for performance reasons.) Note: These are all very rough estimates. FPUs generally have more pipeline stages, and there are lots of other factors to consider, like the L1 cache and trace cache. This is just a baseline estimate.
As I stated earlier, increasing the number of transistors by such a large amount without adding more functional units would make the Prescott design scale worse than the Northwood design. Why would Intel do that!? Going to 31 stages would have been done to decrease the average number of transistors per stage, and they would likely aim to be at worst about the same as the Northwood. I certainly don't know for sure what was done, but various rumors and the fact that the 32-bit and 64-bit ALUs run at different speeds make me wonder. I suppose we'll know more in about two or three months, if not before then.
TrogdorJW - Thursday, February 19, 2004 - link
PC3200 is 3.2 GB/s single channel, and dual-channel it is 6.4 GB/s. XDR single-channel is 6.4 GB/s, so in a dual-channel setup (which is very likely, since almost all Rambus implementations in the past were dual-channel other than i820 - and we all know what a fiasco that was!) XDR will be 12.8 GB/s.It is important to note that DDR is normally a 64-bit bus, where RDRAM/XDR are apparently a 16-bit bus. Running 64 traces over a motherboard at high clock speeds is difficult at best, but if you cut that to 16 traces, it is not as hard. That's what Rambus was all about initially. Now, DDR is running 200 MHz (400 effective) with 128 traces in dual-channel operation. XDR is countering by running 16 traces at 400 MHz (3200 effective).
I find it interesting that the clock speed of XDR is really 400 MHz externally, but then internally they send eight bits per clock. From what was said in the article, I guess they first multiply the clock by four, and then they more or less use DDR tactics where you send a bit on the rising and falling clock. The end result, though, is the same. DDR2 does something similar, I believe. "1 GHz" DDR2 is really running at 250 MHz, with four bits per clock. So they double the clock and then send data on the rising and falling clock signal.
In order to match XDR, DDR2 would have to run at 200 MHz and an effective 800 MHz. We're seeing that on graphics cards, but it looks like that is still a ways off for motherboards. The latency question is still not really being answered by Rambus. "Low latency" at 3.2 GHz effective speeds could mean anything. I have seen that DDR2 is only offering CAS Latencies of 3, 4, and 5. I wonder what the equivalent XDR latency is - probably something like 6, 8, and 10.
If/when retail boards are released using XDR, it could be an exciting matchup. Prescott at 4 GHz could make very good use of added memory bandwidth, I bet. Integrated graphics with a 12.8 GB/s memory subsystem might actually not suck that hard! :)
Malladine - Thursday, February 19, 2004 - link
Oops...actually wanted to ask #13 if that means that PC3200 is 1.6gb/s single channel?Malladine - Thursday, February 19, 2004 - link
bhtooefr - Thursday, February 19, 2004 - link
#3, he said dual channel. Single channel IS 3.2GB/s, but dual channel is 6.4. I was going to point out that DC-DDR was the same speed as XDR.(my own comment) Remember when Northwood came out, and it didn't have HyperThreading enabled, but later released enabled it? Well, I wouldn't be surprised if the P4-F or P4-G is a Prescott-64.
KalTorak - Thursday, February 19, 2004 - link
The processor data bus has been 64 bits wide since the original Pentium processor, as I recall.AgaBooga - Thursday, February 19, 2004 - link
I predicted by reading Anand's articles months ago that the 64bit feature would be a lot like the Hyper-threading not enabled in the Williamette cores. I'm expecting something similar to happen this time around. I think Intel's whole timing thing to an extent is true, but had the 64bit helped or not decreased performance by a lot, they might have released it enabled.Pumpkinierre - Wednesday, February 18, 2004 - link
Even if you do enable 64 bit functionality in Prescott, wont you need a mobo or at least a BIOS upgrade to handle it. You probably dont need the memory size extension on the address bus but I dont know the size of the data bus on the prescott. If it is only 32 bit wide then it would need to carry out two fetches for full 64bit functionality (plus internal 64bit manipulation) but this would require a change to the microcode in the BIOS. Unless this is already present in prescott BIOS upgrades for mobos(i875/865) then you may have difficulties even if you 'switch on' the X86-64 commands. I suspect they are not going to enable it til Sckt 755.Jeff7181 - Wednesday, February 18, 2004 - link
One thing I wonder about is how flexible x86-64 is. Could it go through a revision that drops support for 32-bit instructions to enhance 64-bit performance when 64-bit software is the only software you can buy?KalTorak - Wednesday, February 18, 2004 - link
And given the iffy bandwidth available at Moscone West, I think Derek's doing pretty well to get these reports in :)The XDR stuff _was_ pretty cool; I didn't realize there was a clock source with low enough jitter to make that thing work.