Intel Developer Forum Spring 2004 - Day 1: Wider and Faster
by Derek Wilson on February 18, 2004 10:41 AM EST- Posted in
- Trade Shows
Nocona and Prescott: Intel adopts x86-64
There was an incredible amount of speculation that Intel would make a formal announcement about their use of AMD's 64 bit extensions to the x86 ISA at IDF this week. Intel has indicated for a long while now that they would adopt 64 bit for the desktop "when it makes sense." Of course, the time frame we have been given for such a thing making sense has always been much further down the road than this year, and the tradition continues still. Even though Intel has announced that its Nocona (90nm Xeon) processor will have 64 bit x86 extensions enabled, they are targeting this squarely at the workstation/server market and have still not made the decision to move x86-64 to the desktop.It is the case that x86-64 can be released in a P4 form anytime Intel wants (though it may end up being later rather than sooner) since the Nocona processor is based on a Prescott core with its 64 bit hardware enabled in a Xeon package (and with Xeon sized caches and features). We are looking into the method with which Intel has disabled the 64 bit extensions in current versions of Prescott, but we don't have conclusive data as of yet. We suspect, though, that the extensions are disabled much the same way that clock speeds are locked (so that neither enthusiasts nor remarkers can add value not included straight from Intel).
As far as what is actually going on architecturally, we still need to do a little digging. We do know some things for fact. Intel's implementation of the x86-64 extensions will be completely compatible with AMD's. The extensions are in the current version of Prescott in a disabled state (and Intel is still determining an appropriate time to release a 64bit enabled P4).
We still have questions about Intel's ALU design and how it supports the new extensions, as well as whether or not Nocona will have a larger trace cache than Prescott. Needless to day, there are still plenty of things we don't know yet.
Up until last year (with the release of the Athlon64), Intel had four options for more accessible 64 bit computing: bring the Itanium's EPIC (Explicitly Parallel Instruction Computer) based IA-64 to the desktop, develop a desktop 64 bit ISA based around fast emulation of x86, create their own 64 bit extensions to the x86 architecture, or adopt AMD's extensions to x86 for their future processors. Let's take a look at these options to try to understand how we got here today.
17 Comments
View All Comments
TrogdorJW - Thursday, February 19, 2004 - link
If anyone is interested in reading more speculation on Prescott and how it gets 64-bits, I posted some of my *theories* over at the FiringSquad forums. Here's the link for the complete discussion:http://forums.firingsquad.com/firingsquad/board/me...
The important part is as follows:
----------------------------------
The big question now is, how well Prescott-64 perform? I think that they can get the heat under control. (More speculation.) However, maximizing 64-bit performance might be a bit more difficult. Look at AMD with the stuff they've licensed from Intel. Intel still beats them in MMX, SSE, and now SSE2 performance (although they are getting closer with each new processor release).
Some other interesting things about the news: Intel is going to clock the ALUs (Arithmetic Logic Units) at core speed when running 64-bit code, apparently. Actually, they say 7 GHz in 32-bit mode and 4 GHz in 64-bit mode. That's a little odd, since the current ALUs run at twice the core speed in 32-bit mode, so 7 GHz would be from a 3.5 GHz processor. Why they would run at 4 GHz and not 3.5 GHz I couldn't say. Maybe because they can?
How will that affect performance? It depends on how the 64-bit extensions were added. If they use the same setup as the regular P4 core, with the only difference being that they added registers and made them 64-bits wide, then it would likely hurt performance relative to 32-bit mode. However, it is *possible* that the 64-bit was added on as a completely separate module. If this is the case, they might have separate 64-bit ALUs/AGUs. In other words, the current NetBurst design has 7 functional units: Two simple ALUs that run at 2X core speed, one complex ALU running at core speed, an FPU/SSE Move/Store, a full FPU/SSE that handles all of those operations, and two AGUs (Address Generation Units). The 64-bit extensions in Prescott/Nocona/Potomac (called "Clackamas Technology") could have their own AGUs and ALUs.
That would make sense to me, since as I mentioned in my earlier speculation, the core is currently about 73 million transistors compared to the Northwood's 29 million. Northwood has 7 functional units with 20 pipeline stages, giving about 205,000 transistors per stage per unit (29 million / (20 * 7)). If the Prescott design simply extended the NetBurst architecture to 64-bits and 31 stages, it would be around 335,000 transistors per stage per unit (72 million / (31 * 7)). On the other hand, if the 64-bit extensions are added in a separate module with their own AGUs and ALUs, the Prescott would now have 11 functional units. That would give 214,000 transistors per stage per unit (73 million / (31 * 11)). An even more radical approach might be to have three 64-bit AGUs and ALUs. Then you would only have about 178,000 transistors per stage per unit. (That's a little more hard to believe, but since Intel is being forced to adopt AMD's instruction set, they might want to adopt the architecture for performance reasons.) Note: These are all very rough estimates. FPUs generally have more pipeline stages, and there are lots of other factors to consider, like the L1 cache and trace cache. This is just a baseline estimate.
As I stated earlier, increasing the number of transistors by such a large amount without adding more functional units would make the Prescott design scale worse than the Northwood design. Why would Intel do that!? Going to 31 stages would have been done to decrease the average number of transistors per stage, and they would likely aim to be at worst about the same as the Northwood. I certainly don't know for sure what was done, but various rumors and the fact that the 32-bit and 64-bit ALUs run at different speeds make me wonder. I suppose we'll know more in about two or three months, if not before then.
TrogdorJW - Thursday, February 19, 2004 - link
PC3200 is 3.2 GB/s single channel, and dual-channel it is 6.4 GB/s. XDR single-channel is 6.4 GB/s, so in a dual-channel setup (which is very likely, since almost all Rambus implementations in the past were dual-channel other than i820 - and we all know what a fiasco that was!) XDR will be 12.8 GB/s.It is important to note that DDR is normally a 64-bit bus, where RDRAM/XDR are apparently a 16-bit bus. Running 64 traces over a motherboard at high clock speeds is difficult at best, but if you cut that to 16 traces, it is not as hard. That's what Rambus was all about initially. Now, DDR is running 200 MHz (400 effective) with 128 traces in dual-channel operation. XDR is countering by running 16 traces at 400 MHz (3200 effective).
I find it interesting that the clock speed of XDR is really 400 MHz externally, but then internally they send eight bits per clock. From what was said in the article, I guess they first multiply the clock by four, and then they more or less use DDR tactics where you send a bit on the rising and falling clock. The end result, though, is the same. DDR2 does something similar, I believe. "1 GHz" DDR2 is really running at 250 MHz, with four bits per clock. So they double the clock and then send data on the rising and falling clock signal.
In order to match XDR, DDR2 would have to run at 200 MHz and an effective 800 MHz. We're seeing that on graphics cards, but it looks like that is still a ways off for motherboards. The latency question is still not really being answered by Rambus. "Low latency" at 3.2 GHz effective speeds could mean anything. I have seen that DDR2 is only offering CAS Latencies of 3, 4, and 5. I wonder what the equivalent XDR latency is - probably something like 6, 8, and 10.
If/when retail boards are released using XDR, it could be an exciting matchup. Prescott at 4 GHz could make very good use of added memory bandwidth, I bet. Integrated graphics with a 12.8 GB/s memory subsystem might actually not suck that hard! :)
Malladine - Thursday, February 19, 2004 - link
Oops...actually wanted to ask #13 if that means that PC3200 is 1.6gb/s single channel?Malladine - Thursday, February 19, 2004 - link
bhtooefr - Thursday, February 19, 2004 - link
#3, he said dual channel. Single channel IS 3.2GB/s, but dual channel is 6.4. I was going to point out that DC-DDR was the same speed as XDR.(my own comment) Remember when Northwood came out, and it didn't have HyperThreading enabled, but later released enabled it? Well, I wouldn't be surprised if the P4-F or P4-G is a Prescott-64.
KalTorak - Thursday, February 19, 2004 - link
The processor data bus has been 64 bits wide since the original Pentium processor, as I recall.AgaBooga - Thursday, February 19, 2004 - link
I predicted by reading Anand's articles months ago that the 64bit feature would be a lot like the Hyper-threading not enabled in the Williamette cores. I'm expecting something similar to happen this time around. I think Intel's whole timing thing to an extent is true, but had the 64bit helped or not decreased performance by a lot, they might have released it enabled.Pumpkinierre - Wednesday, February 18, 2004 - link
Even if you do enable 64 bit functionality in Prescott, wont you need a mobo or at least a BIOS upgrade to handle it. You probably dont need the memory size extension on the address bus but I dont know the size of the data bus on the prescott. If it is only 32 bit wide then it would need to carry out two fetches for full 64bit functionality (plus internal 64bit manipulation) but this would require a change to the microcode in the BIOS. Unless this is already present in prescott BIOS upgrades for mobos(i875/865) then you may have difficulties even if you 'switch on' the X86-64 commands. I suspect they are not going to enable it til Sckt 755.Jeff7181 - Wednesday, February 18, 2004 - link
One thing I wonder about is how flexible x86-64 is. Could it go through a revision that drops support for 32-bit instructions to enhance 64-bit performance when 64-bit software is the only software you can buy?KalTorak - Wednesday, February 18, 2004 - link
And given the iffy bandwidth available at Moscone West, I think Derek's doing pretty well to get these reports in :)The XDR stuff _was_ pretty cool; I didn't realize there was a clock source with low enough jitter to make that thing work.