Sooo... is intel like crying in a corner right now? On one side we have AMD eating their lunch in the consumer space, they still haven’t launched a full gamut of 10nm CPUs. Apple just announced that they’re dropping them in basically the next 5 years. And now ARM really is encroaching on their core server business.
I feel like in 20 years from now we’re gonna be using intel as a cautionary tale of hubris and mismanagement. Or whatever it is that caused them to fail so spectacularly.
Honestly I think the proclaimed death of intel is vastly exaggerated. AMD came back from worse places and they do still have the manufacturing edge. Intel CPU for desktop still use less power, which is a big plus. How many people do you know that bought the fastest CPU available recently? Glad AMD is back on track, they were in a rough place, far worse than intels current situation.
It's worth noting that this is based on ARM's Neoverse N1 IP, which is also used in the AWS Graviton2. The Graviton2 benchmarks damn close to the best AMD and Intel stuff, so this chip looks very promising [1]. It's really looking to be a breakthrough year for ARM outside of the mobile market.
Phoronix paints a very different picture, especially in non-synthetic workloads[1]. Gravitron2 looks like a nice speedup over the first generation but either the optimization isn't there yet or there are areas which need additional work to become more developer/HPC competitive. That said, I'm thrilled we have competition in the architecture space for general purpose compute again.
Does anyone have an evaluation board for these things? Their marketing materials scream "scam" to me. For one thing they compare to competing x86 parts by arbitrarily downrating them to 85% of their actual SPECrate scores. Why? Then they switch baseline x86 chips when making claims about power efficiency ... for performance claims they use the AMD EPYC 7742 then for performance/TDP they use the 7702, which has the tendency to make the AMD look worse because it is spending the same amount of power driving its uncore but it's 11% slower than the 7742.
Also, without pricing, all these efficiency claims are totally meaningless.
This reminds me of Tilera, who had a 64 core mesh connected CPU ten about ten years ago. The problems seemed to be it was harder to optimize due to the mesh connectivity (like NUMA but multidimensional), low clock speeds, and lack of improvement after an initially promising launch.
Will this be the same? It seems possible. Does it really get more work done per watt than x86?
And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?
It depends a bit on how you utilize these CPUs. A lot of server software is optimized for just a few cores. Even products optimized for using more than 1 thread tend to be tested and used mostly with 4/8 core configurations. And then of course there are a few popular server-side languages that are effectively single threaded typically (e.g. python) and use multiple processes to leverage multiple cores. Launching 80 python processes on an 80 core machine may not be the best way to utilize available resources compared to e.g. a Java process with a few hundred threads.
With non blocking IO and async processing that can be good enough but to fully utilize dozens/hundreds of CPU cores from a single process, you basically want something that can do both threading and async. But assuming each core performs at a reasonable percentage of e.g. a Xeon core (lets say 40%) and doesn't slow down when all cores are fully loaded, you would expect a CPU with 80 cores to more than keep up with a 16 or even 32 core Xeon. Of course the picture gets murkier if you throw in specialized instructions for vector processing, GPUs, etc.
> And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?
These chips obviously have variable clock speed, but apparently nothing like the complicated boost mechanisms on recent x86 processors. My guess is that Turbo speed here is simply full speed, and doesn't depend significantly on how many cores are active, and doesn't let the chip exceed its nominal TDP for short (or not so short) bursts the way x86 processors do.
These chips are practical and can go into servers that are similar in performance to x86 servers.
ARM has well-thought out NUMA support, probably a system this size or larger should be divided into logical partitions anyway. (e.g. out of 128 cores maybe you pick 4 to be management processors to begin with).
Products like this show that Apple could have an ARM based Mac Pro in two years relatively easily. They already have PCIe Gen 4. TDP and memory capacity is already more than intel provides in the Xeon workstation line that they use.
If they do that, I wonder whether it would make sense for Apple to get into the ARM server CPU business while they'are at it.
Currently, the Intel Xeon is used in both high-end workstations and servers. If one x86 design can be suitable for both of those, presumably one ARM design could do the same.
If they could sell server CPUs at a profit, then Apple could get more return on its design investment by getting into two markets. And they'd get more volume. Though apparently they'd be facing competition from Ampere and Amazon's Graviton.
I think it’s a good time to invest in a Mac Pro. While working from home I’m asking myself the benefit of a laptop when a desktop could give me so much more performance.
> Where Graviton2 is designed to suit Amazon’s needs for Arm-based instances, Ampere’s goal is essentially to supply a better-than-Graviton2 solution to the rest of the big cloud service providers (CSPs).
So the question is whether they can land Google, Microsoft, and/or Alibaba as customers for an alternative to AWS M6g instances.
I'm interested to know what applications really scale to these core counts. When I was working with large datasets (for finance) other bottlenecks tended to dominate, not computation, so memory pressure, and throughput from the SAN were more important.
These high density configurations were key when rack space was at a premium, but these days, power is the limitation, so this is interesting to provide more low power cores, i'm just not sure who is going to get the most benefit from them though...
With 80 cores I can get 40 2-core VMs all pegging their CPUs on a single processor without any core contention. Multiply up by the number of sockets. That might be the more interesting application for cloud providers than going for a single use case for the entire box.
Where this might get interesting, depending on how the pricing stacks up, is that if you're in the cloud function business, this will increase the number of function instances you can afford to keep warmed up and ready to fire. In those situations you're not bottlenecked on the total bandwidth for the function itself (usually), your constraint is getting from zero to having the executable in VM it's going to run in, and from there getting it into the core past whatever it's contending with. If there's nothing to contend with and it's just waiting for a (probably fairly small) trigger signal, execution time from the point of view of whatever's downstream could easily be dominated by network transit times.
Plain old io-bound multiprocess work would be a good match. Like static content and php sites, for example. I imagine there's quite a lot of that out there.
BTW I wonder why one might need a workstation with many less beefy cores as opposed to several more powerful cores. What kind of interactive tasks require that?
E.g. i suppose computer animation rather takes a GPU than 32-64 universal cores, and compilers are still not so massively-parallel.
> deliver significant cost savings over other general-purpose instances for scale-out applications such as web servers, containerized microservices, data/log processing, and other workloads that can run on smaller cores and fit within the available memory footprint.
> provide up to 40% better price performance over comparable current generation x86-based instances1 for a wide variety of workloads,
From what I read, it's not terribly hard to tell your compiler to compile for a particular instruction set, you just need to do it. Cost savings and better performance are great incentives, as well as Apple moving their Mac platform to it will drive more market share for developers to take the time to recompile.
Less electricity used. Air conditioning is a big cost in large data centers. Lower power use cpus mean less heat which means less ac needed which drives down total costs.
Of course different cpus can do different amounts of work per amount of electricity used, but arm generally works out better on a watt per unit of work basis.
"Ampere" can't be trade marked since it's a name of a scientist? Unless they are operating on the same market/segment and can prove there is willful intent to defraud customers? probably a hard sell.
Am I the only one who is super-annoyed at having to figure out everytime if this is Ampere the company or Ampere the new nVidia line?
I mean it's probably not the fault of either, and a huge coincidence we're getting a flurry of news articles about both in summer of 2020, but come'on (can we have some kind of edits in the titles of HN posts to make the distinction clear?).
The thing that has me bearish on cpu manufacturers in general... From what I understand parallel architectures vastly simplify the overall schematics of CPUs in general, while retaining the power-saving benefits.
As we approach the critical velocity (supply / demand) for parallel architectures, the prospects of bootstrapping a CPU manufacturing company will become extremely feasible. IMO currently it's mostly the specialized knowledge needed to design CPUs that keeps this mostly out of reach today.
I'm no expert, just have an interest in the space, so any dissenting opinions / facts welcome.
Can anyone explain in a few sentences why the ARM architecture seems to outperform traditional CPUs so much? What fundamentally prevents Intel from building something comparable?
There is no inherent advantage to the ARM architecture other than it being designed recently (64-bit ARM is less than a decade old) whereas x86 has a lot of baggage it has to carry.
There is no proof that these outperform traditional CPUs at all. That is the reason you don't see them being used anywhere other than niche use cases or for cost reasons.
It is a Reduced Instruction Set computer. It's a greatly simplified design.
The x86_64 ISA is absolutely insane. The only way to implement it in hardware efficiently is to "compile" the super complicated instructions into micro-ops which can actually be decoded and executed on the CPU.
Said another way, Intel has to implement a compiler in hardware which compiles the machine code before it gets executed. The extra complexity means more power and less performance.
Does TSMC have the capacity to support AMD / AWS / Ampere etc making a significant dent in the server market alongside longstanding commitments to Apple etc?
Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?
TSMC never had capacity problem. Which mainstream media likes to run the story. You dont go and ask if TSMC has another spare 10K wafer capacity sitting around. TSMC plans their capacity based on their client's forecasting and projection many months in advance. They will happily expand their capacity if you are willing to commit to it. Like how Apple was willing to bet on TSMC, and TSMC basically built a Fab specifically for Apple.
This is much easier for AWS since they are using it themselves with their own SaaS offering. It is harder for AMD since they dont know how much they could sell. And AMD being conservatives meant they dont order more than they are able to chew.
>Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?
I am not sure I understand the question correctly. But AWS already invested hundreds of millions in their own ARM CPU called Graviton.
They did with Intel Custom Foundry. They tried and they failed. And they currently have no intention to try that again. At least not until they admit defeat. Which is going to take at least another few years if not longer.
The most interesting blurp I read was "superscalar aggressive out-of-order execution". But I read nothing about security mitigations or concerns with such "aggressive" optimizations.
[+] [-] vxNsr|5 years ago|reply
I feel like in 20 years from now we’re gonna be using intel as a cautionary tale of hubris and mismanagement. Or whatever it is that caused them to fail so spectacularly.
[+] [-] raxxorrax|5 years ago|reply
[+] [-] ianai|5 years ago|reply
Aka yes it’s a cautionary tail and time to run from that ship.
[+] [-] DCKing|5 years ago|reply
[1]: https://www.anandtech.com/show/15578/cloud-clash-amazon-grav...
[+] [-] Refefer|5 years ago|reply
[1] https://www.phoronix.com/scan.php?page=article&item=epyc-vs-...
[+] [-] embrassingstuff|5 years ago|reply
Will we need to recompile? Will it be almost-100%-binary-equivalent-with-some-hidden-bugs ?
[+] [-] jeffbee|5 years ago|reply
Also, without pricing, all these efficiency claims are totally meaningless.
[+] [-] IanCutress|5 years ago|reply
[+] [-] sitkack|5 years ago|reply
https://www.packet.com/cloud/servers/c2-large-arm/
[+] [-] jzwinck|5 years ago|reply
Will this be the same? It seems possible. Does it really get more work done per watt than x86?
And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?
[+] [-] jillesvangurp|5 years ago|reply
With non blocking IO and async processing that can be good enough but to fully utilize dozens/hundreds of CPU cores from a single process, you basically want something that can do both threading and async. But assuming each core performs at a reasonable percentage of e.g. a Xeon core (lets say 40%) and doesn't slow down when all cores are fully loaded, you would expect a CPU with 80 cores to more than keep up with a 16 or even 32 core Xeon. Of course the picture gets murkier if you throw in specialized instructions for vector processing, GPUs, etc.
[+] [-] rbanffy|5 years ago|reply
[+] [-] wtallis|5 years ago|reply
These chips obviously have variable clock speed, but apparently nothing like the complicated boost mechanisms on recent x86 processors. My guess is that Turbo speed here is simply full speed, and doesn't depend significantly on how many cores are active, and doesn't let the chip exceed its nominal TDP for short (or not so short) bursts the way x86 processors do.
[+] [-] PaulHoule|5 years ago|reply
ARM has well-thought out NUMA support, probably a system this size or larger should be divided into logical partitions anyway. (e.g. out of 128 cores maybe you pick 4 to be management processors to begin with).
[+] [-] samcat116|5 years ago|reply
[+] [-] jagger27|5 years ago|reply
[+] [-] adrianmonk|5 years ago|reply
Currently, the Intel Xeon is used in both high-end workstations and servers. If one x86 design can be suitable for both of those, presumably one ARM design could do the same.
If they could sell server CPUs at a profit, then Apple could get more return on its design investment by getting into two markets. And they'd get more volume. Though apparently they'd be facing competition from Ampere and Amazon's Graviton.
[+] [-] ed25519FUUU|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] emmanueloga_|5 years ago|reply
* There's one "Ampere Computing" [1], but I guess I'm not "in the know" since it is the first time I heard about it :-/
* There's one Ampere [2], "codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia".
Are both things related? Is "Nvidia's Ampere" developed by "Ampere" the company?
Also, I think Ampere is kind of a bad name for a processor line... just makes me think it of high current, power-hungry, low efficiency, etc. :-)
1: https://en.wikipedia.org/wiki/Ampere_Computing
2: https://en.wikipedia.org/wiki/Ampere_(microarchitecture)
[+] [-] why_only_15|5 years ago|reply
[+] [-] shadykiller|5 years ago|reply
[+] [-] sradman|5 years ago|reply
So the question is whether they can land Google, Microsoft, and/or Alibaba as customers for an alternative to AWS M6g instances.
[+] [-] cesaref|5 years ago|reply
These high density configurations were key when rack space was at a premium, but these days, power is the limitation, so this is interesting to provide more low power cores, i'm just not sure who is going to get the most benefit from them though...
[+] [-] regularfry|5 years ago|reply
Where this might get interesting, depending on how the pricing stacks up, is that if you're in the cloud function business, this will increase the number of function instances you can afford to keep warmed up and ready to fire. In those situations you're not bottlenecked on the total bandwidth for the function itself (usually), your constraint is getting from zero to having the executable in VM it's going to run in, and from there getting it into the core past whatever it's contending with. If there's nothing to contend with and it's just waiting for a (probably fairly small) trigger signal, execution time from the point of view of whatever's downstream could easily be dominated by network transit times.
[+] [-] tyingq|5 years ago|reply
[+] [-] ambicapter|5 years ago|reply
[+] [-] rbanffy|5 years ago|reply
I want a workstation with one of these.
[+] [-] gpm|5 years ago|reply
[+] [-] asguy|5 years ago|reply
[+] [-] zanny|5 years ago|reply
[+] [-] nine_k|5 years ago|reply
E.g. i suppose computer animation rather takes a GPU than 32-64 universal cores, and compilers are still not so massively-parallel.
[+] [-] spott|5 years ago|reply
Are they significantly cheaper per GHz*core? If so, how hard is it to make use of that power, will a simple recompile work?
[+] [-] lowmemcpu|5 years ago|reply
> deliver significant cost savings over other general-purpose instances for scale-out applications such as web servers, containerized microservices, data/log processing, and other workloads that can run on smaller cores and fit within the available memory footprint.
> provide up to 40% better price performance over comparable current generation x86-based instances1 for a wide variety of workloads,
From what I read, it's not terribly hard to tell your compiler to compile for a particular instruction set, you just need to do it. Cost savings and better performance are great incentives, as well as Apple moving their Mac platform to it will drive more market share for developers to take the time to recompile.
Edit: Forgot to add the source of those quotes: https://aws.amazon.com/ec2/graviton/
[+] [-] ksec|5 years ago|reply
In the context of AWS.
They are cheaper per some / specific workload* on AWS.
Especially when ARM Graviton 2's vCPU on AWS are actual CPU core while Intel / AMD instances are CPU thread.
And in general AWS offers the G2 instances with the same vCPU core at 20% discount compared to AMD / Intel instances.
[+] [-] bluGill|5 years ago|reply
Of course different cpus can do different amounts of work per amount of electricity used, but arm generally works out better on a watt per unit of work basis.
[+] [-] nullifidian|5 years ago|reply
[+] [-] dbancajas|5 years ago|reply
[+] [-] the_hoser|5 years ago|reply
[+] [-] fizixer|5 years ago|reply
I mean it's probably not the fault of either, and a huge coincidence we're getting a flurry of news articles about both in summer of 2020, but come'on (can we have some kind of edits in the titles of HN posts to make the distinction clear?).
[+] [-] unexaminedlife|5 years ago|reply
As we approach the critical velocity (supply / demand) for parallel architectures, the prospects of bootstrapping a CPU manufacturing company will become extremely feasible. IMO currently it's mostly the specialized knowledge needed to design CPUs that keeps this mostly out of reach today.
I'm no expert, just have an interest in the space, so any dissenting opinions / facts welcome.
[+] [-] goerz|5 years ago|reply
[+] [-] webaholic|5 years ago|reply
There is no proof that these outperform traditional CPUs at all. That is the reason you don't see them being used anywhere other than niche use cases or for cost reasons.
[+] [-] dahfizz|5 years ago|reply
The x86_64 ISA is absolutely insane. The only way to implement it in hardware efficiently is to "compile" the super complicated instructions into micro-ops which can actually be decoded and executed on the CPU.
Said another way, Intel has to implement a compiler in hardware which compiles the machine code before it gets executed. The extra complexity means more power and less performance.
You can read more about how microcode and micro ops work here: https://en.m.wikipedia.org/wiki/Intel_Microcode
[+] [-] klelatti|5 years ago|reply
Does TSMC have the capacity to support AMD / AWS / Ampere etc making a significant dent in the server market alongside longstanding commitments to Apple etc?
Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?
[+] [-] ksec|5 years ago|reply
TSMC never had capacity problem. Which mainstream media likes to run the story. You dont go and ask if TSMC has another spare 10K wafer capacity sitting around. TSMC plans their capacity based on their client's forecasting and projection many months in advance. They will happily expand their capacity if you are willing to commit to it. Like how Apple was willing to bet on TSMC, and TSMC basically built a Fab specifically for Apple.
This is much easier for AWS since they are using it themselves with their own SaaS offering. It is harder for AMD since they dont know how much they could sell. And AMD being conservatives meant they dont order more than they are able to chew.
>Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?
I am not sure I understand the question correctly. But AWS already invested hundreds of millions in their own ARM CPU called Graviton.
[+] [-] paulsutter|5 years ago|reply
[+] [-] ksec|5 years ago|reply
[+] [-] ArgyleSound|5 years ago|reply
[+] [-] rurban|5 years ago|reply