top | item 23689419

Ampere’s Product List: 80 Cores, up to 3.3 GHz at 250 W; 128 Core in Q4

170 points| rbanffy | 5 years ago |anandtech.com | reply

188 comments

order
[+] vxNsr|5 years ago|reply
Sooo... is intel like crying in a corner right now? On one side we have AMD eating their lunch in the consumer space, they still haven’t launched a full gamut of 10nm CPUs. Apple just announced that they’re dropping them in basically the next 5 years. And now ARM really is encroaching on their core server business.

I feel like in 20 years from now we’re gonna be using intel as a cautionary tale of hubris and mismanagement. Or whatever it is that caused them to fail so spectacularly.

[+] raxxorrax|5 years ago|reply
Honestly I think the proclaimed death of intel is vastly exaggerated. AMD came back from worse places and they do still have the manufacturing edge. Intel CPU for desktop still use less power, which is a big plus. How many people do you know that bought the fastest CPU available recently? Glad AMD is back on track, they were in a rough place, far worse than intels current situation.
[+] ianai|5 years ago|reply
From what I’ve heard Intel management was taken over by marketing “professionals.” It’s an awful place to work and probably devoid of tech leadership.

Aka yes it’s a cautionary tail and time to run from that ship.

[+] DCKing|5 years ago|reply
It's worth noting that this is based on ARM's Neoverse N1 IP, which is also used in the AWS Graviton2. The Graviton2 benchmarks damn close to the best AMD and Intel stuff, so this chip looks very promising [1]. It's really looking to be a breakthrough year for ARM outside of the mobile market.

[1]: https://www.anandtech.com/show/15578/cloud-clash-amazon-grav...

[+] Refefer|5 years ago|reply
Phoronix paints a very different picture, especially in non-synthetic workloads[1]. Gravitron2 looks like a nice speedup over the first generation but either the optimization isn't there yet or there are areas which need additional work to become more developer/HPC competitive. That said, I'm thrilled we have competition in the architecture space for general purpose compute again.

[1] https://www.phoronix.com/scan.php?page=article&item=epyc-vs-...

[+] embrassingstuff|5 years ago|reply
How different are these ARM server implentations from each other ?

Will we need to recompile? Will it be almost-100%-binary-equivalent-with-some-hidden-bugs ?

[+] jeffbee|5 years ago|reply
Does anyone have an evaluation board for these things? Their marketing materials scream "scam" to me. For one thing they compare to competing x86 parts by arbitrarily downrating them to 85% of their actual SPECrate scores. Why? Then they switch baseline x86 chips when making claims about power efficiency ... for performance claims they use the AMD EPYC 7742 then for performance/TDP they use the 7702, which has the tendency to make the AMD look worse because it is spending the same amount of power driving its uncore but it's 11% slower than the 7742.

Also, without pricing, all these efficiency claims are totally meaningless.

[+] IanCutress|5 years ago|reply
We're working with Ampere to get access when they're ready to let us test.
[+] jzwinck|5 years ago|reply
This reminds me of Tilera, who had a 64 core mesh connected CPU ten about ten years ago. The problems seemed to be it was harder to optimize due to the mesh connectivity (like NUMA but multidimensional), low clock speeds, and lack of improvement after an initially promising launch.

Will this be the same? It seems possible. Does it really get more work done per watt than x86?

And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?

[+] jillesvangurp|5 years ago|reply
It depends a bit on how you utilize these CPUs. A lot of server software is optimized for just a few cores. Even products optimized for using more than 1 thread tend to be tested and used mostly with 4/8 core configurations. And then of course there are a few popular server-side languages that are effectively single threaded typically (e.g. python) and use multiple processes to leverage multiple cores. Launching 80 python processes on an 80 core machine may not be the best way to utilize available resources compared to e.g. a Java process with a few hundred threads.

With non blocking IO and async processing that can be good enough but to fully utilize dozens/hundreds of CPU cores from a single process, you basically want something that can do both threading and async. But assuming each core performs at a reasonable percentage of e.g. a Xeon core (lets say 40%) and doesn't slow down when all cores are fully loaded, you would expect a CPU with 80 cores to more than keep up with a 16 or even 32 core Xeon. Of course the picture gets murkier if you throw in specialized instructions for vector processing, GPUs, etc.

[+] rbanffy|5 years ago|reply
You need a lot of memory bandwidth and large caches, or else the cores will starve. That's also why IBM mainframes have up to 4.5 GB of L4 cache.
[+] wtallis|5 years ago|reply
> And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?

These chips obviously have variable clock speed, but apparently nothing like the complicated boost mechanisms on recent x86 processors. My guess is that Turbo speed here is simply full speed, and doesn't depend significantly on how many cores are active, and doesn't let the chip exceed its nominal TDP for short (or not so short) bursts the way x86 processors do.

[+] PaulHoule|5 years ago|reply
These chips are practical and can go into servers that are similar in performance to x86 servers.

ARM has well-thought out NUMA support, probably a system this size or larger should be divided into logical partitions anyway. (e.g. out of 128 cores maybe you pick 4 to be management processors to begin with).

[+] samcat116|5 years ago|reply
Products like this show that Apple could have an ARM based Mac Pro in two years relatively easily. They already have PCIe Gen 4. TDP and memory capacity is already more than intel provides in the Xeon workstation line that they use.
[+] jagger27|5 years ago|reply
It would be weird (and cool) if Apple ends up being the company to provide easy off the shelf access to a powerful Arm workstation.
[+] adrianmonk|5 years ago|reply
If they do that, I wonder whether it would make sense for Apple to get into the ARM server CPU business while they'are at it.

Currently, the Intel Xeon is used in both high-end workstations and servers. If one x86 design can be suitable for both of those, presumably one ARM design could do the same.

If they could sell server CPUs at a profit, then Apple could get more return on its design investment by getting into two markets. And they'd get more volume. Though apparently they'd be facing competition from Ampere and Amazon's Graviton.

[+] ed25519FUUU|5 years ago|reply
I think it’s a good time to invest in a Mac Pro. While working from home I’m asking myself the benefit of a laptop when a desktop could give me so much more performance.
[+] emmanueloga_|5 years ago|reply
Is anybody else confused by the "Ampere" brand name? I was trying to figure out what Ampere is...

* There's one "Ampere Computing" [1], but I guess I'm not "in the know" since it is the first time I heard about it :-/

* There's one Ampere [2], "codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia".

Are both things related? Is "Nvidia's Ampere" developed by "Ampere" the company?

Also, I think Ampere is kind of a bad name for a processor line... just makes me think it of high current, power-hungry, low efficiency, etc. :-)

1: https://en.wikipedia.org/wiki/Ampere_Computing

2: https://en.wikipedia.org/wiki/Ampere_(microarchitecture)

[+] why_only_15|5 years ago|reply
They are not related as far as I can tell other than being named "Ampere".
[+] shadykiller|5 years ago|reply
Most logical naming of processors I’ve ever seen. E.g: Q80-33 - 80 Cores 3.3 Ghz Q32-17 - 32 Cores 1.7 Ghz
[+] sradman|5 years ago|reply
> Where Graviton2 is designed to suit Amazon’s needs for Arm-based instances, Ampere’s goal is essentially to supply a better-than-Graviton2 solution to the rest of the big cloud service providers (CSPs).

So the question is whether they can land Google, Microsoft, and/or Alibaba as customers for an alternative to AWS M6g instances.

[+] cesaref|5 years ago|reply
I'm interested to know what applications really scale to these core counts. When I was working with large datasets (for finance) other bottlenecks tended to dominate, not computation, so memory pressure, and throughput from the SAN were more important.

These high density configurations were key when rack space was at a premium, but these days, power is the limitation, so this is interesting to provide more low power cores, i'm just not sure who is going to get the most benefit from them though...

[+] regularfry|5 years ago|reply
With 80 cores I can get 40 2-core VMs all pegging their CPUs on a single processor without any core contention. Multiply up by the number of sockets. That might be the more interesting application for cloud providers than going for a single use case for the entire box.

Where this might get interesting, depending on how the pricing stacks up, is that if you're in the cloud function business, this will increase the number of function instances you can afford to keep warmed up and ready to fire. In those situations you're not bottlenecked on the total bandwidth for the function itself (usually), your constraint is getting from zero to having the executable in VM it's going to run in, and from there getting it into the core past whatever it's contending with. If there's nothing to contend with and it's just waiting for a (probably fairly small) trigger signal, execution time from the point of view of whatever's downstream could easily be dominated by network transit times.

[+] tyingq|5 years ago|reply
Plain old io-bound multiprocess work would be a good match. Like static content and php sites, for example. I imagine there's quite a lot of that out there.
[+] ambicapter|5 years ago|reply
Insofar as webservers go, more cores equal more simultaneous connections, no? I doubt network links are saturated yet.
[+] rbanffy|5 years ago|reply
As cool as it is, these server announcements are somewhat disheartening.

I want a workstation with one of these.

[+] gpm|5 years ago|reply
It has PCIE lanes, what, other than price, stops you from buying a rack, sticking a graphics card in it, and calling it a workstation?
[+] zanny|5 years ago|reply
You can get a Threadripper 3990x with 64 cores in a "regular" workstation.
[+] nine_k|5 years ago|reply
BTW I wonder why one might need a workstation with many less beefy cores as opposed to several more powerful cores. What kind of interactive tasks require that?

E.g. i suppose computer animation rather takes a GPU than 32-64 universal cores, and compilers are still not so massively-parallel.

[+] spott|5 years ago|reply
I'm kind of curious: what is the selling point of an ARM server? Why would I use an ARM instance on AWS or similar instead of an x86?

Are they significantly cheaper per GHz*core? If so, how hard is it to make use of that power, will a simple recompile work?

[+] lowmemcpu|5 years ago|reply
Yes. Here's what AWS' page says

> deliver significant cost savings over other general-purpose instances for scale-out applications such as web servers, containerized microservices, data/log processing, and other workloads that can run on smaller cores and fit within the available memory footprint.

> provide up to 40% better price performance over comparable current generation x86-based instances1 for a wide variety of workloads,

From what I read, it's not terribly hard to tell your compiler to compile for a particular instruction set, you just need to do it. Cost savings and better performance are great incentives, as well as Apple moving their Mac platform to it will drive more market share for developers to take the time to recompile.

Edit: Forgot to add the source of those quotes: https://aws.amazon.com/ec2/graviton/

[+] ksec|5 years ago|reply
>what is the selling point of an ARM server? .....Are they significantly cheaper per GHzcore?

In the context of AWS.

They are cheaper per some / specific workload* on AWS.

Especially when ARM Graviton 2's vCPU on AWS are actual CPU core while Intel / AMD instances are CPU thread.

And in general AWS offers the G2 instances with the same vCPU core at 20% discount compared to AMD / Intel instances.

[+] bluGill|5 years ago|reply
Less electricity used. Air conditioning is a big cost in large data centers. Lower power use cpus mean less heat which means less ac needed which drives down total costs.

Of course different cpus can do different amounts of work per amount of electricity used, but arm generally works out better on a watt per unit of work basis.

[+] nullifidian|5 years ago|reply
How come there isn't a trademark issue with NVidia? I was very confused for a moment.
[+] dbancajas|5 years ago|reply
"Ampere" can't be trade marked since it's a name of a scientist? Unless they are operating on the same market/segment and can prove there is willful intent to defraud customers? probably a hard sell.
[+] the_hoser|5 years ago|reply
The name of the company is Ampere. The name of the product is Altra. Trademarks don't automatically apply to all usages of the word.
[+] fizixer|5 years ago|reply
Am I the only one who is super-annoyed at having to figure out everytime if this is Ampere the company or Ampere the new nVidia line?

I mean it's probably not the fault of either, and a huge coincidence we're getting a flurry of news articles about both in summer of 2020, but come'on (can we have some kind of edits in the titles of HN posts to make the distinction clear?).

[+] unexaminedlife|5 years ago|reply
The thing that has me bearish on cpu manufacturers in general... From what I understand parallel architectures vastly simplify the overall schematics of CPUs in general, while retaining the power-saving benefits.

As we approach the critical velocity (supply / demand) for parallel architectures, the prospects of bootstrapping a CPU manufacturing company will become extremely feasible. IMO currently it's mostly the specialized knowledge needed to design CPUs that keeps this mostly out of reach today.

I'm no expert, just have an interest in the space, so any dissenting opinions / facts welcome.

[+] goerz|5 years ago|reply
Can anyone explain in a few sentences why the ARM architecture seems to outperform traditional CPUs so much? What fundamentally prevents Intel from building something comparable?
[+] webaholic|5 years ago|reply
There is no inherent advantage to the ARM architecture other than it being designed recently (64-bit ARM is less than a decade old) whereas x86 has a lot of baggage it has to carry.

There is no proof that these outperform traditional CPUs at all. That is the reason you don't see them being used anywhere other than niche use cases or for cost reasons.

[+] dahfizz|5 years ago|reply
It is a Reduced Instruction Set computer. It's a greatly simplified design.

The x86_64 ISA is absolutely insane. The only way to implement it in hardware efficiently is to "compile" the super complicated instructions into micro-ops which can actually be decoded and executed on the CPU.

Said another way, Intel has to implement a compiler in hardware which compiles the machine code before it gets executed. The extra complexity means more power and less performance.

You can read more about how microcode and micro ops work here: https://en.m.wikipedia.org/wiki/Intel_Microcode

[+] klelatti|5 years ago|reply
Two questions:

Does TSMC have the capacity to support AMD / AWS / Ampere etc making a significant dent in the server market alongside longstanding commitments to Apple etc?

Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?

[+] ksec|5 years ago|reply
>TSMC....

TSMC never had capacity problem. Which mainstream media likes to run the story. You dont go and ask if TSMC has another spare 10K wafer capacity sitting around. TSMC plans their capacity based on their client's forecasting and projection many months in advance. They will happily expand their capacity if you are willing to commit to it. Like how Apple was willing to bet on TSMC, and TSMC basically built a Fab specifically for Apple.

This is much easier for AWS since they are using it themselves with their own SaaS offering. It is harder for AMD since they dont know how much they could sell. And AMD being conservatives meant they dont order more than they are able to chew.

>Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?

I am not sure I understand the question correctly. But AWS already invested hundreds of millions in their own ARM CPU called Graviton.

[+] paulsutter|5 years ago|reply
Maybe Intel should become a fab like TSMC and leave the CPU market to more innovative folks
[+] ksec|5 years ago|reply
They did with Intel Custom Foundry. They tried and they failed. And they currently have no intention to try that again. At least not until they admit defeat. Which is going to take at least another few years if not longer.
[+] ArgyleSound|5 years ago|reply
Isn't the fab part precisely where Intel has hit a giant stumbling block
[+] rurban|5 years ago|reply
The most interesting blurp I read was "superscalar aggressive out-of-order execution". But I read nothing about security mitigations or concerns with such "aggressive" optimizations.