This is pretty important for AMD because they've been having a terrible time matching Intel at per cycle efficiency. Basically, ever since the core i series started, AMD has been behind at single threaded performance, especially when going clock for clock. They've been trying to make up for it by both offering more cores/threads than comparably priced Intel components, as well as trying to scale their clock rates. Unfortunately for AMD, their last few generations have actually been falling short on their attempts to boost clock rates, as a result, when comparing comparably priced AMD and Intel chips, while AMD would typically have a clock speed advantage, it would often not be enough to overcome Intel's efficiency.
This product release is kind of an attempt to show that AMD can actually deliver on their planned strategy.
As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency? Probably because AMD's resources are severely limited compared to Intel, and this approach offered lower risk at lower cost.
> As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency?
Well, because clock speeds are something they can improve now, and not in $n years time when their next major microarchitecture is ready. Intel had the exact same problem with the Pentium 4, and they were similarly stuck with minor tweaks and desperately increasing clock rates for years before Core was ready.
"Per-clock efficiency" is generally not goal in itself in CPU design, absolute performance and efficiency are. Where AMD has stumbled is getting the clock up, probably partly related to their unfortunate fab situation
The speed-demon strategy has seen successes historically, Pentium 4's fate notwithstanding. See eg the DEC 21164 and the IBM z196.
That is what steamroller will do, increase IPC. This is an interim fix. If they can run steamroller at these clocks and Intel doesn't do something radical, AMD will be "top dog" again for the first time since Athlon 64s were killing P4s.
Based on how bulldozer performed, it'll end up that sure it's 5ghz, but did we mention all our instructions take two times the number of clocks now?
Virtualiztion host performance on our 8 core bulldozer (esx 5.1, private kbs from vmware to try to help, 32gb ram, rad10 zfs san) was so bad (think p4 era) that we finally tracked down how to force the cpu into only using 4 cores, one per real fp core.
The reality is that there is no mainstream scheduler out that that can efficienty use cores set up like that, especially with the long pipelines. I'm not sure it can't be done, but what improvements have been made have been minimal, or just in an academic/not a real os situation.
That's why Intel ships a compiler, duh.
It is true that the # one thing holding that part back was the raw clock speed (as long as you view it more like a 4 core, 8 thread part ala Intel), but i've gone back to speccing intel - it's just not worth being that much of a ginae pig for a firm thats basically trying to scrape by until the arm64 parts start getting stamped.
Indeed. We have a few zEnterprise systems in our corp data center (geeky enough looking that I want one in my basement), running with 5.5GHz chips. https://en.wikipedia.org/wiki/IBM_System_z
They didn't ship the first 64-bit CPU to run Windows either. Windows NT ran on Alpha (among other architectures). Granted the OS was still 32-bit, but the CPU wasn't.
But what do you expect from a press release? It's written by marketing trolls, not engineers.
Does it really qualify as a 5 GHz processor if it only runs at that speed in Turbo mode? (which I assume can only kick in for a few milliseconds...) What is the "normal" speed that it can maintain for more reasonable time periods? How come this isn't mentioned in the press release?
A lot of that comparative slowness was caused by mechanical storage, replacing it with parallel and lower-latency SSDs makes it a lot easier to get full use out of more and faster cores. It doesn't cost much to set up a system that's completely CPU-limited on OLAP database-like workloads these days.
I suspect that as more software stops being optimized for ~10ms serial disk I/O with huge caches this will become more common and more and faster cores will be a big(er) deal.
Can you hear it - the shrieks of all those D-14 and Phantex-s screaming.
I would like to see review though. And pricing. If it has decent single thread performance and that number of cores with all next gen games being multhithreaded by default it could be a compelling processor if it is in the 4770 price range.
The problem is that clock doesn't really mean anything concrete in terms of real world performance. It's strictly a marketing thing.
For an example, what if a chip used a 10 GHz clock for distribution, and divided it down to 5 GHz everywhere it was actually used (not that I know of any reason to do such a thing besides marketing). Would it be marketable as a 10 GHz chip? The manufacturer would certainly be in hot water if enthusiasts ever found out...
Even without such contrived scenarios, CPUs get different amounts of stuff done per clock.
Something I keep seeing, even on Slashdot and Hacker News, is the idea that a CPU that has to clock higher for a given performance will use more power. It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.
If anyone's got any contrary ideas on that, I'd love to hear them. All I can think of is that higher clocks would correlate with longer pipelines, but bulldozer's pipeline isn't even that long.
"is the idea that a CPU that has to clock higher for a given performance will use more power."
This is like a dog whistle to the EEs, they're going to get all riled up by programmers with screwdrivers. You can model a stereotypical FET gate as a capacitor, all you're really doing is charging and discharging capacitors either in FET gates or the transmission line theoretical capacitance. Right out of the C=Q/V definition of what capacitance is, mushed up against some ohms law and some algebra, and you end up with P=C times V squared times F. So you can see the intense excitement in lowering core voltages, making gates and lines smaller (lowering C) all in a tradeoff to improve the P/F or F/P (whatever) ratio.
The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.
It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock
Suppose CPU A has an adder, that takes one clock cycle to run an add instruction. When two registers are being added, the instruction goes thru the entire adder in one clock cycle and affects on average some % of the transistors.
Suppose CPU B has a pipelined adder that takes two clock cycles to run an add instruction. When two registers are being added, the instruction goes thru half of the adder in one cycle, and the other half in the next cycle, and affects about half of that same % of the transistors each time. BUT! This is a pipelined adder, and doesn't just do one instruction at a time. During the first cycle, when our instruction is in the first part of the adder, some other add instruction is still going thru the second part of the adder and affecting the other half of whatever % of the transistors. And during the second cycle of our instruction, the next instruction is going thru the first half. So even tho any one instruction only affects half of the adder at a time, the entire adder still gets affected every clock cycle.
It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.
Nope; a lot of the latches are switching every cycle, so power is higher at higher frequency. This is what doomed NetBurst-style design.
I remember like ~10 years ago on Slashdot some people overclocking to 7-8ghz. Of course this was on single core chips, but we've really pretty much completely stalled on the mhz progression haven't we?
Clock speed is not a meaningful end unto itself, and that's why it stalled. It was used as a proxy for speed for many years, and this led to its rampant inflation. Instructions per second (IPS) is a more meaningful metric for CPU speed, and that has by no means stalled, even on a per-core basis.
I don't think these are practical limitations, more like limitations to be able to sell laptops and desktops.
If we told intel that they could burn up to 350watts on the CPU and a 25lbs heatsink was acceptable, we'd probably have 10ghz processors. Problem is, there isn't a large market for that. Home users don't want a big ugly and noisy box and server buyers would prefer power and heat savings. Supercomputers just tie all this stuff together instead of creating some monster single-core.
Actually, this was the strategy with the pentium 4. It was a fast and power hungry single-core. Turns out, efficiency per cycle and multicore are just superior solutions.
Also, I think there are some physical limitations that keep chips below a certain clock speed. Besides, the bet has been on "smarter instead of faster", i.e. producing chips that suit our computing needs, which are more adequately supported by parallel processing.
High performance cores are useful for problems that are hard to paralellize, but so far it seems that the breakthrough only occurs when a new approach to the problem makes it feasible on multiprocessing platforms (e.g. graph processing is hard to paralellize due to dependencies among graph nodes, Pregel and similar offer a different approach)...a 50GHz CPU won't save you if you need to process a huge graph (i.e. billions of nodes) on a single thread, it'll always take a lot of time.
As to the "record", I think IBM already had a Series Z that is over 5GHz.
The name is FX-9590. It has 8 compute cores. AMD's internal designation for this generation is "Piledriver". They've chosen to name their high-end compute family after construction equipment (their previous were named after racing tracks). 5 GHz Max Turbo is also not part of the name, it is a description of its performance. It's "baseline" performance is probably something like 4GHz or something (pulling out of my ass). The Max Turbo refers to that using their thermal management system, they can peak at least one of their cores to 5GHz for some period of time. The "Max" is in there because there are intermediate turbo speeds for varying thermal situations and CPU loads.
The CPU has different clock speeds depending on how much it is being used. The upper limit is set by thermal and power constraints. If you're using all the cores, the limit is relatively low. If you're only using one or two cores, the power management system will clock them higher. Since the higher clock speed is only available under certain workloads, it's called "Turbo".
I wonder how two of these (16 cores) compare against the new mac pro with the best CPU in software that benefits from many cores like 3D rendering and such or virtualization.
Id bet they are close while the AMD only costs a fraction of the Xeon. I know its not a fair comparision since the FX-9000 is not a workstation cpu, but still...
The multi-threaded Pov-Ray and Cinebench tests are just about the only two benchmarks where the AMD 8-cores beat the i7 2600k, and just barely beats it.
The Intel chips soundly win in anything else (encoding, Photoshop...), and by almost 2X in some of the single-threaded tests.
Yes AMD, we all know you like your big numbers like core counts and clock speed. It'd, however, be just excellent if you could put out a product whose single threaded performance isn't garbage. I mean, thubans are beating your newest and greatest!
But at least you can say you got the bigger cache, clockspeed, core count, and debt than intel.
In the end it's not the frequencies nor number of cores but performance per watt that matters.
Most computers run on batteries these days, and those that don't drain ever more expensive electricity from the wall socket and at the same time waste a lot of it producing huge amounts of heat.
The more you get out of a watt the better. You can either trade in speed for lower power or trade in power for better performance, but in either case you want the performance/watt ratio to be the highest.
I would guess the power consumption of running the chip at 5GHz is pretty high. And running temperatures as well. And yet there are fewer and fewer of those huge tasks that you can only do with one core.
Since Pentium 133 I never had Intel processor in desktop computer. I wanted few times but AMD was always cheeper for the same speed. Sure fastest were almost always the Intel ones but the additional bit of speed never justified the price.
How ironic that AMD are now suffering from the same thing that once gave them the edge (that is, Pentium 4's overly aggressive clock speed roadmap and lacklustre per-clock efficiency).
I guess their marketing was out of other ideas and just went back to the well one more time. It has probably been 10 years now since I really considered CPU clock speed as a factor when buying a computer.
I'm yet to find an actual source for that figure whenever it comes up. Is it just some tech site comment section spitballing or did they actually disclose the TDP?
Agreed, I don't know how you are going to cool that thing quietly or cheaply. I think the thermal load would be less on two cpu chips running at 2/3-rds the GHz. It might be cheaper to build as well.
[+] [-] icegreentea|12 years ago|reply
This product release is kind of an attempt to show that AMD can actually deliver on their planned strategy.
As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency? Probably because AMD's resources are severely limited compared to Intel, and this approach offered lower risk at lower cost.
[+] [-] Freaky|12 years ago|reply
Well, because clock speeds are something they can improve now, and not in $n years time when their next major microarchitecture is ready. Intel had the exact same problem with the Pentium 4, and they were similarly stuck with minor tweaks and desperately increasing clock rates for years before Core was ready.
[+] [-] puivert|12 years ago|reply
Bulldozer was always supposed to be a high clock long pipe machine, sacrificing some IPC. See eg http://www.anandtech.com/show/5057/the-bulldozer-aftermath-d...
"Per-clock efficiency" is generally not goal in itself in CPU design, absolute performance and efficiency are. Where AMD has stumbled is getting the clock up, probably partly related to their unfortunate fab situation
The speed-demon strategy has seen successes historically, Pentium 4's fate notwithstanding. See eg the DEC 21164 and the IBM z196.
[+] [-] voidlogic|12 years ago|reply
[+] [-] kristofferR|12 years ago|reply
A lot of people will probably be deceived by the high number, thinking that a higher number automatically always is better.
[+] [-] yekko|12 years ago|reply
[+] [-] trotsky|12 years ago|reply
Virtualiztion host performance on our 8 core bulldozer (esx 5.1, private kbs from vmware to try to help, 32gb ram, rad10 zfs san) was so bad (think p4 era) that we finally tracked down how to force the cpu into only using 4 cores, one per real fp core.
The reality is that there is no mainstream scheduler out that that can efficienty use cores set up like that, especially with the long pipelines. I'm not sure it can't be done, but what improvements have been made have been minimal, or just in an academic/not a real os situation.
That's why Intel ships a compiler, duh.
It is true that the # one thing holding that part back was the raw clock speed (as long as you view it more like a 4 core, 8 thread part ala Intel), but i've gone back to speccing intel - it's just not worth being that much of a ginae pig for a firm thats basically trying to scrape by until the arm64 parts start getting stamped.
[+] [-] e12e|12 years ago|reply
http://www.anandtech.com/show/7066/amd-announces-fx9590-and-...
Based on an old review:
http://www.anandtech.com/show/6396/the-vishera-review-amd-fx...
and single thread performance:
http://www.anandtech.com/show/6396/the-vishera-review-amd-fx...
If single threaded scales linearly with turbo frequency (and it looks like it might):
The FX8320 (turbo boost 4.0Ghz) scores 240.7, while the FX8350 (turbo boost 4.2Ghz) scores 252.1:
The difference aligns quite nicely: (240.7/4)4.2~252.74
And for 5Ghz should give about: (240.7/4)5~300.88
This is still lower than intel's i5 3570k (302.2 - Turbo 3.8Ghz) and i7 3770k (312.4 - Turbo 3.9Ghz)
And Haswell has even higher performance:
http://www.anandtech.com/show/7003/the-haswell-review-intel-...
[+] [-] TallGuyShort|12 years ago|reply
[+] [-] itcmcgrath|12 years ago|reply
[+] [-] CountHackulus|12 years ago|reply
[+] [-] AnthonyMouse|12 years ago|reply
But what do you expect from a press release? It's written by marketing trolls, not engineers.
[+] [-] drcode|12 years ago|reply
[+] [-] rbanffy|12 years ago|reply
[+] [-] userulluipeste|12 years ago|reply
A: Wait faster!
[+] [-] bcoates|12 years ago|reply
I suspect that as more software stops being optimized for ~10ms serial disk I/O with huge caches this will become more common and more and faster cores will be a big(er) deal.
[+] [-] venomsnake|12 years ago|reply
I would like to see review though. And pricing. If it has decent single thread performance and that number of cores with all next gen games being multhithreaded by default it could be a compelling processor if it is in the 4770 price range.
[+] [-] deepblueq|12 years ago|reply
For an example, what if a chip used a 10 GHz clock for distribution, and divided it down to 5 GHz everywhere it was actually used (not that I know of any reason to do such a thing besides marketing). Would it be marketable as a 10 GHz chip? The manufacturer would certainly be in hot water if enthusiasts ever found out...
Even without such contrived scenarios, CPUs get different amounts of stuff done per clock.
Something I keep seeing, even on Slashdot and Hacker News, is the idea that a CPU that has to clock higher for a given performance will use more power. It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.
If anyone's got any contrary ideas on that, I'd love to hear them. All I can think of is that higher clocks would correlate with longer pipelines, but bulldozer's pipeline isn't even that long.
[+] [-] VLM|12 years ago|reply
This is like a dog whistle to the EEs, they're going to get all riled up by programmers with screwdrivers. You can model a stereotypical FET gate as a capacitor, all you're really doing is charging and discharging capacitors either in FET gates or the transmission line theoretical capacitance. Right out of the C=Q/V definition of what capacitance is, mushed up against some ohms law and some algebra, and you end up with P=C times V squared times F. So you can see the intense excitement in lowering core voltages, making gates and lines smaller (lowering C) all in a tradeoff to improve the P/F or F/P (whatever) ratio.
The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.
[+] [-] tbrownaw|12 years ago|reply
Suppose CPU A has an adder, that takes one clock cycle to run an add instruction. When two registers are being added, the instruction goes thru the entire adder in one clock cycle and affects on average some % of the transistors.
Suppose CPU B has a pipelined adder that takes two clock cycles to run an add instruction. When two registers are being added, the instruction goes thru half of the adder in one cycle, and the other half in the next cycle, and affects about half of that same % of the transistors each time. BUT! This is a pipelined adder, and doesn't just do one instruction at a time. During the first cycle, when our instruction is in the first part of the adder, some other add instruction is still going thru the second part of the adder and affecting the other half of whatever % of the transistors. And during the second cycle of our instruction, the next instruction is going thru the first half. So even tho any one instruction only affects half of the adder at a time, the entire adder still gets affected every clock cycle.
[+] [-] wmf|12 years ago|reply
Nope; a lot of the latches are switching every cycle, so power is higher at higher frequency. This is what doomed NetBurst-style design.
[+] [-] tibbon|12 years ago|reply
[+] [-] mistercow|12 years ago|reply
[+] [-] drzaiusapelord|12 years ago|reply
If we told intel that they could burn up to 350watts on the CPU and a 25lbs heatsink was acceptable, we'd probably have 10ghz processors. Problem is, there isn't a large market for that. Home users don't want a big ugly and noisy box and server buyers would prefer power and heat savings. Supercomputers just tie all this stuff together instead of creating some monster single-core.
Actually, this was the strategy with the pentium 4. It was a fast and power hungry single-core. Turns out, efficiency per cycle and multicore are just superior solutions.
[+] [-] jmngomes|12 years ago|reply
High performance cores are useful for problems that are hard to paralellize, but so far it seems that the breakthrough only occurs when a new approach to the problem makes it feasible on multiprocessing platforms (e.g. graph processing is hard to paralellize due to dependencies among graph nodes, Pregel and similar offer a different approach)...a 50GHz CPU won't save you if you need to process a huge graph (i.e. billions of nodes) on a single thread, it'll always take a lot of time.
As to the "record", I think IBM already had a Series Z that is over 5GHz.
[+] [-] louthy|12 years ago|reply
Ridiculous name. Maybe if they put a 'Go faster stripe' on the top of the chip people will believe it goes even faster!
[+] [-] icegreentea|12 years ago|reply
[+] [-] sp332|12 years ago|reply
[+] [-] rbanffy|12 years ago|reply
[+] [-] SteveTickle|12 years ago|reply
[+] [-] joyeuse6701|12 years ago|reply
[+] [-] nakedrobot2|12 years ago|reply
[+] [-] Stolpe|12 years ago|reply
[+] [-] venomsnake|12 years ago|reply
[+] [-] vidarh|12 years ago|reply
[+] [-] adestefan|12 years ago|reply
[+] [-] kayoone|12 years ago|reply
[+] [-] bluedino|12 years ago|reply
The Intel chips soundly win in anything else (encoding, Photoshop...), and by almost 2X in some of the single-threaded tests.
[+] [-] Everlag|12 years ago|reply
But at least you can say you got the bigger cache, clockspeed, core count, and debt than intel.
[+] [-] yason|12 years ago|reply
Most computers run on batteries these days, and those that don't drain ever more expensive electricity from the wall socket and at the same time waste a lot of it producing huge amounts of heat.
The more you get out of a watt the better. You can either trade in speed for lower power or trade in power for better performance, but in either case you want the performance/watt ratio to be the highest.
I would guess the power consumption of running the chip at 5GHz is pretty high. And running temperatures as well. And yet there are fewer and fewer of those huge tasks that you can only do with one core.
[+] [-] sliverstorm|12 years ago|reply
It depends on the workload, really. It should already be obvious that this part is not meant to be a Joe Everyman processor.
[+] [-] makomk|12 years ago|reply
[+] [-] scotty79|12 years ago|reply
[+] [-] shawnz|12 years ago|reply
[+] [-] leeoniya|12 years ago|reply
[+] [-] fchief|12 years ago|reply
[+] [-] IanChiles|12 years ago|reply
[+] [-] Moto7451|12 years ago|reply
[+] [-] jdavid|12 years ago|reply
[+] [-] nvmc|12 years ago|reply