Intel Developer Forum Spring 2004 - Wrapup
by Derek Wilson on February 23, 2004 8:44 PM EST- Posted in
- Trade Shows
The Information Age
Pat Gelsinger's presentation began with a bit of a history lesson taking us from kilobyte to gigabytes. What followed was a discussion of how Intel will attack the "Era of Tera" as Pat dubbed the next step up.
Much of the rest of the keynote was geared toward answering the questions of why we need "tera" anything, and how Intel plans to approach the problem of achieving such high performance computing. The answer to the first question came back to reflect what Sean Maloney had said about broadband pushing computer systems to their limit: recognition, mining, and synthesis of data (for which Gelsinger used the poorly chosen acronym RMS). Essentially, Gelsinger is saying that the "Era of Tera" will allow us to operate quickly on massive data sets to identify very complex patterns and situations, as well as help us generate data that blurs the lines of reality.
As examples of usefulness, Pat mentioned that computers were able to detect the possibility of what happened on September 11, but they were a week or two late in doing so. Data mining would allow us to do such things as search the web for image data based on what the image actually looked like. As for an example of synthesis, we were shown a demo of realtime raytracing. Visualization being the infinitely parallelizable problem that it is, this demo was a software renderer running on a cluster of 23 dual 2.2GHz Xeon processors. The world will be a beautiful place when we can pack this kind of power into a GPU and call it a day.
Of course, we still need to answer the question of how we are going to get from here to there. As surprising as it may seem, Intel's answer isn't to push for ever increasing frequencies. With some nifty charts and graphs, Pat showed us that we wouldn't be able to rely on increases in clock frequency giving us the same increases in performance as we have had in the past. The graphs showed the power density of Intel processors approaching that of the sun if it remains on its current trend, as well as a graph showing that the faster a processor, the more cycles it wastes waiting for data from memory (since memory latency hasn't decreased at the same rate as clock speed has increased). Also, as chips are fabbed with smaller and smaller processes, increasing clock speeds will lead to problems with moving data across around a chip in less than one clock cycle (because of interconnect RC delays).
In addition to clock speed not being able to pull us out of the mud, architectural advances in processors are limited by the maximum instruction level parallelism (ILP) available in any given program (the max amount of work a processor can do is limited because not all instructions can be completed in parallel: some instructions are dependant on the result of other instructions). Since the average maximum ILP isn't increasing in programs, we will need to find another way to increase the performance of a processor.
If clock frequency isn't going to get us anywhere, and we are hitting a wall with increasing how many instructions per cycle we can complete in a single program, the only other option is to increase parallelism on the thread level. Rather than trying to get more done in a single program or thread, we will have to have multiple processors running independent code at the same time. Intel's first step in this direction was the baby step of Hyper Threading, but dual core, multicore and massively multicore processors are on the horizon for Intel.
In addition to massively multicore architectures, Intel needs to eliminate bottlenecks from other parts of the system as well. One of the ways they plan on doing this is via a feature called Helper Threads. Apparently, half of the execution time of any given processes is spent waiting for data. If that data could be available in the cache for the processes when they needed it, everything would run much faster. Helper Threads are apparently able to warm up the cache for a specific process when they would normally have a cache miss. In the demo of Helper Threads Intel ran a benchmark on an Itanium processor and a "research Itanium," and we saw 8.9% speedup and 23% fewer cache misses from the Helper Thread enabled side.
One of the other paths Intel is looking down is adaptability. Adaptive body biasing (forward biasing a transistor when it is on, and reverse biasing when it is off) to increase performance and decrease power lost to leakage is being explored on the silicon level. On the large scale, adaptive architectures and platforms are being explored. Reconfigurable architectures such as adaptive wireless radio arrays that can be easily reconfigured to work with multiple types of wireless networks are another example of the kind of adaptability Intel wants to see evolve in the future.
By utilizing massive multiprocessing and adaptive/programmable architectures, the hope is that systems will be able to form themselves to the needs of the programs they are running while doing as many things as possible in any given nanosecond (or part thereof as the case may be).
Of course, that's the future. Dual core processors aren't even going to be showing up this year (though next year might be a different story if we are lucky), and reconfigurable and adaptive computing has been discussed for a very long time. It is very exciting to see what some of the visions Intel's farthest looking people have to say about where we are headed, but it also serves to make us a little bit like the next few years will be an eternal day-before-Christmas.
9 Comments
View All Comments
TrogdorJW - Tuesday, February 24, 2004 - link
Ugh... IPS was supposed to be IPC.IPS has been proposed as an alternative to MHz as a processor speed measurement (Instructions Per Second = IPC * MHz), but figuring out the *average* number of instructions per clock is likely to bring up a whole new set of problems.
TrogdorJW - Tuesday, February 24, 2004 - link
The AMD people will probably love this quote:"We still need to answer the question of how we are going to get from here to there. As surprising as it may seem, Intel's answer isn't to push for ever increasing frequencies. With some nifty charts and graphs, Pat showed us that we wouldn't be able to rely on increases in clock frequency giving us the same increases in performance as we have had in the past. The graphs showed the power density of Intel processors approaching that of the sun if it remains on its current trend, as well as a graph showing that the faster a processor, the more cycles it wastes waiting for data from memory (since memory latency hasn't decreased at the same rate as clock speed has increased). Also, as chips are fabbed with smaller and smaller processes, increasing clock speeds will lead to problems with moving data across around a chip in less than one clock cycle (because of interconnect RC delays)."
Of course, this is nothing new. Intel has been pursuing clock speed with P4 and parallelism with P-M and Itanium. In an ideal world, you would have Pentium M/Athlon IPS with P4 clock speeds. Anyway, it looks like programmers (WOOHOO - THAT'S ME!) are going to become more important than ever in the future processor wars. Writing software to properly take advantage of multiple threads is still an enormously difficult task.
Then again, if game developers for example would give up on the "pissing contest" of benchmarks and code their games to just run at a constant 100 FPS max, it might be less of an issue. If CPUs get fast enough that they can run well over 100 fps on games, then they could stop being "Real Time Priority" processes.
It really irks me that most games suck up 100% of the processor power. If I could get by with 30% processor usage and let the rest be multi-tasked out to other threads while maintaining a good frame rate, why should the game not do so? This is especially annoying on games that aren't real-time, like the turn-based strategy games.
TrogdorJW - Tuesday, February 24, 2004 - link
"As for an example of synthesis, we were shown a demo of realtime raytracing. Visualization being the infinitely parallelizable problem that it is, this demo was a software renderer running on a cluster of 23 dual 2.2GHz Xeon processors. The world will be a beautiful place when we can pack this kind of power into a GPU and call it a day."Heheheh.... I like that. It's a real-time raytracing demo! Woohoo! I've heard people talk about raytracing being a future addition to graphics cards. If you assume that the GPU with specialized hardware could do raytracing ten times faster than the software on the Xeons, we'll still need 5 GHz graphics chips to pull it off. Or two chips running at 2.5 GHz? Still, the thought of being able to play a game with Toy Story quality graphics is pretty cool. Can't wait for 2010!
Shuxclams - Tuesday, February 24, 2004 - link
Oops, no comment before. Am I seeing things or do I see a southbridge, northbridge and memory controller?SHUX
Shuxclams - Tuesday, February 24, 2004 - link
HammerFan - Tuesday, February 24, 2004 - link
Intel probably won't use an onboard mem controller for a long time...i've heard that their first experiences with them weren't good. Also, the northbridges are way too big to no have a mem controller on board.*new topic*
That BTX case looks wacky to me...why such a big heatsink for the CPU?
*new topic*
I have the same question Cygni had: Are their any CTs in these pictures, or are there none out-and-about yet?
Ecmaster76 - Tuesday, February 24, 2004 - link
I counted eight dimms on the first board and either six or eight on the second one. Dual core memory controller? If so it would help Intel keep the Xeon from being spanked by Opteron as they scale.capodeloscapos - Tuesday, February 24, 2004 - link
Quote: " It is possible that future games (and possibly games ported by lazy console developers) may want to use the CPU and main memory a great deal and therefore benefit from PCI Express"cough!, Halo, Cough!, Colin McRae 3, cough!...
:)
Cygni - Tuesday, February 24, 2004 - link
I like the attempt to hide the number of DIMM slots... but i think its still pretty easy to tell how many are there, becaouse of the top of the slots still showing, as well as a little of the bottom of the last slot.So, is Intel trying to hide that Lindenhurst is 64bit (XeonCE) compatible, or am i off base here?