IDF Spring 2005 - Predicting Future CPU Architecture Trends
by Anand Lal Shimpi on March 3, 2005 7:43 PM EST- Posted in
- Trade Shows
The successor to Pentium 4 is...
For quite some time now we've been trying to figure out what the successor to the Pentium 4's Netburst architecture would be. When the Pentium M was first released, everyone expected it to be the direct successor to the Pentium 4, but things obviously didn't work out that way.
Intel had Tejas ready to go, the successor to Prescott, but at that time it was clear that the path that they had chosen for the Pentium 4 had come to an end - limited by power. The Pentium M was a reasonable competitor, but not exactly a revolutionary successor to the Pentium 4. Based on our conversations and our experiences at IDF we're finally able to start piecing together what the eventual successor to the Pentium 4 will be. Remember that the Pentium 4 architecture will continue to exist throughout 2005 as the Pentium D and Pentium Extreme Edition, but with Intel's decision to drop the number 4 it's clear that they are ready for a departure from the Pentium 4 brand and architecture.
The first question has always been pipeline depth, will the successor to Netburst have a long pipeline like Prescott, or a short pipeline like the Pentium M. The answer appears to be somewhere in between Pentium M and Prescott, realistically being much closer to Willamette's 20 stage integer pipeline than Prescott's 31 stage pipe, for strictly power reasons. Intel is no longer doing as much research as they once were in branch prediction, indicating an end to the extreme pipeline growth that we've seen since the introduction of the Pentium 4. There has been a lot of research into areas such as continuous flow pipelines, but it's unclear whether that sort of technology will make its way into the next iteration of the Pentium line.
A lot of the lessons learned in the Pentium M will of course be applied to the Netburst successor, with Micro Ops Fusion being mentioned quite frequently. Intel management is finally aware that clock speed isn't the sole seller of CPUs, so they are more willing to design more elegant, high IPC cores at lower clock speeds this next time around - a lot of this is due to the success with Centrino (part of the reason why you see a switch to the Pentium brand name instead of Pentium 4, Pentium may very well become a platform much like Centrino).
For the next generation desktop microarchitecture, Intel still appears to be committed to the current style of big out-of-order cores, meaning that we won't see any Cell-style architectures from Intel this next time around. For the most part, we think this makes a lot of sense at the present time given the applications that are currently being run. Intel's thoughts are this; if they were to move immediately to a simpler core architecture and use a large number of them in parallel, that leaves too much opportunity for another company to build a CPU made up of fewer, more powerful cores, which on the current applications would perform better, or at least be easier to program for.
In the generation after the Pentium 4 successor, things may change as Intel has talked about having a handful of big cores and then multiple smaller cores for more specific, extremely parallel workloads. Looking at Intel's view of their microprocessors starting at around 2010, they start to appear a lot like Cell. What may inevitably happen is that Cell may be a bit ahead of its time in the marketplace.
Hyper Threading (SMT) will not die with the Pentium 4, in fact, the number of threads per core will go from 2 up to 4 threads before the end of the decade. The move to 8 threads per core won't happen anytime soon however, apparently there is a pretty sizeable performance gain by enabling 4 threads per core, but not as much when going from 4 to 8 threads.
Larger and software controlled caches will be much more common going forward, also eerily similar to the Cell architecture (the Cell SPEs only have local memory, which is similar to the idea of a software controlled cache).
You can expect a continued focus on SIMD performance, a perfect example happens to be the improvements in SIMD performance in Yonah's core that we reported on earlier.
Although we're quite convinced that an on-die memory controller would result in the best performance per transistor expended on a new architecture, we're doubtful that Intel would consider one. We may have to wait until stacked die and wafer technology before we see any sort of serious reduction in memory latency through techniques other than more caches and more cores.
22 Comments
View All Comments
Doormat - Friday, March 4, 2005 - link
I'm thinking its due to the fact that they make their own chipsets. Intel sells chipsets for $40 or so (MCH+ICH), and their uptake on new RAM technologies is quick (well, faster than AMD is, especially with DDR to DDR2). Plus the engineering cost. It doesnt add up. Unless and until they design a new chip from the ground up, an on die memory controller is a lot of work for not a lot of money. Unless they manage to fall far behind AMD in terms of performace, I dont think it'll show up.sprockkets - Friday, March 4, 2005 - link
Gee, let's do everything possible to improve the situation except, oh shit, ditch x86 code that was created around 30 years ago.elecrzy - Friday, March 4, 2005 - link
#6, intel's reasoning doesn't make sense. they seem make people change mobos, not because of differing ram standards, but because they change cpu socket so damn often.mkruer - Friday, March 4, 2005 - link
#7Sure, as if Intels CPU's are not expensive enough, now you want to all another $15 "Integration on die memory controller" tax
bersl2 - Friday, March 4, 2005 - link
You know, I saw enough flashy graphics in three days to make my head spin, and there were enough pictures of the future to make me think this was a World's Fair. Though what can one expect out of an event like this?xsilver - Thursday, March 3, 2005 - link
#6 what you said doesnt make sense from a performance perspective.... how long does it take for new ram standards to come out? there has been sdram (pc66,100,133) ddr ram (pc2100,2700,3200) and now ddr2 (533) oh, and rambus 600,800,1000that's 10 ram standards for pcs as far back as the pentium 200.... the memory controller on the AMD64 has already been updated from HTT800mhz to HTT1000mhz.... and can be continually revised and just introduced on newer steppings of the same cpu's.... eg. amd's forthcoming "e" spec with sse3, 4x ddr3200 support and other stuff for free
and #5 -- LOL -- so true -- AMD mobo's are so cheap, its not funny (not including the nforce4, but thats another issue)
maybe intel could just charge the extra $15 on their cpu's :P
IntelUser2000 - Thursday, March 3, 2005 - link
Well, Intel said they are not supporting integrated memory controller because you have to change the board and the memory and the CPU every time new RAM standards are out. Looking at desktops, that it make sense not to have memory controllers, but for servers they have a solution. Maybe because its more flexible to have a seperate memory controller? I mean you are pretty limited when the memory controller is integrated, in terms of clock speed scaling, increased complexity and memory standards. It makes sense for servers though and Intel recently announced the Xeon MPs and the Itaniums would have common sockets(same sockets) and have integrated memory controller.Anyways this was interesting:
"The answer appears to be somewhere in between Pentium M and Prescott, realistically being much closer to Willamette's 20 stage integer pipeline than Prescott's 31 stage pipe, for strictly power reasons."
See, like I predicted, its best to consume Pentium 4 Northwood and Pentium M together.
mkruer - Thursday, March 3, 2005 - link
#4Na then Intel cant charge and addtional $15 per northbridge chip.
xsilver - Thursday, March 3, 2005 - link
is there a more detailed reason as to WHY intel does not go with the on die memory controller?has AMD patented it and are unwilling to license it?
hasnt the sucess of the amd64 physically shown that the memory controller is highly effective in improving performance?
alangeering - Thursday, March 3, 2005 - link
"Although we're quite convinced that an on-die memory controller would result in the best performance per transistor expended on a new architecture, we're doubtful that Intel would consider one. We may have to wait until stacked die and wafer technology before we see any sort of serious reduction in memory latency through techniques other than more caches and more cores."Well noted, but a little expansion: the latency drop when going to a stacked die/wafer technology comes from 2 things.
1. Proximity to core
2. Intel will have to provide an on-die memory controller... to have an external controller and stacked wafer ram would be poor engineering.
So, expect to see these things together.