Supermicro is a pretty serious OEM, and if they actually manage to make ARM servers available (which means, in Supermicro parlance, as Global SKUs -boxes you can actually buy from distributors, not the wishy-washy 'contact us for details' stuff that is all over their website lately-), that would be a great win for the ecosystem.
But I'm not exactly holding my breath. Gigabyte is another not-entirely-unserious OEM that previously did lots and lots of ARM press releases. And if you look at https://www.gigabyte.com/us/Enterprise/ARM-Server you may actually be impressed.
But, trying to order one of these things is an entirely different matter. Getting pricing, let alone delivery dates, is just impossible, and if you look closer at aforementioned web page, you'll see that the 'show SKUs' button on many models is simply missing.
And this has been going on even before the latest COVID-induced supply-chain crisis, by the way. Getting your hands on useful ARM server hardware has always been plain impossible (unless, I guess, your're AWS, Microsoft or Google). So, while I'm hopeful that this announcement will improve things, I'm not exactly optimistic...
Totally agree. While it's hard to get hands on a physical box, you can always get bare metal Ampere servers from the Packet guys aka equinixmetal.com. They have them in a few regions.
Ditto for trying to get an Ampere-based HP ProLiant RL Gen11. After trying to get an actual quote from HP for a month, we gave up thinking that the product was vapourware.
I have been working to move Uber's stack to arm64 during 2022[1]. We are quite far ahead. I can say the ecosystem to compile stuff is pretty good, at least for our primary targets C++ and Go.
We don't have performance numbers to show yet (and not sure if we will be able to publish them when we will), but I can say I am pleasantly surprised with how much cross-compiling is easier than it was ~10 years ago. Especially with zig cc. :)
I tend to treat the desire to cross-compile as a rather strong indicator that the architecture is just not viable for general-purpose computing. Either because it's impossible to get actual hardware at all, or the hardware doesn't have decent specs. In both cases, it's going to be difficult find a business justification for building for this architecture (again for general-purpose computing).
Obviously, architecture bringup is an exception, but we are way past that for aarch64.
My M1 MacBook is significantly "faster" than my previous Intel 16", even though the per-core performance are roughly similar in CPU benchmarks (small advantage to M1).
I saw a previous HN comment about this being due to memory bandwidth and cache latency, but I can't seem to substantiate that comment.
M1 is a bigger core than any other core I know of. Just larger reorder buffers, wider execution, etc. etc.
Nominally speaking, its inefficient at that point. You pretty much can make 2 cores fit inside of the M1 core, and each x86 core supports two threads (Would you rather have 4x x86 threads, or 1x M1 core??). I'm intrigued that people continue to find benefits from such a large core (even without SMT / Hyperthreading).
--------
But yes, M1 has huge L1 cache, huge reorder buffers, and extremely wide execution. I'd expect it to win clock-for-clock vs any other core in the market.
But I'm not fully convinced that its the best design / tradeoff. Intel's E-cores + P-cores suggests that modern CPU cores may have become overly big.
My M1 air is only a few minutes behind my old i7-7700k in handbrake encoding a 4k 10 bit HDR movie into a 1080p version. I still have a 2017 16inch MBP that throttles horribly on any major task.
I’m going to some personal benchmarks for all all 3 machines plus my new Ryzen 7950x. It still amazes me the fanless M1 Air is nearly as fast as my old desktop and laptop.
Example 2: https://benchmark.clickhouse.com/hardware/ various servers, while Aarch64 servers have good places, the top results are from EPYC servers. Never EPYC servers have 12-channel DDR-5 memory - sounds like a heaven for ClickHouse :)
> I saw a previous HN comment about this being due to memory bandwidth and cache latency, but I can't seem to substantiate that comment.
I don't know if Apple has released any stats, but it makes a certain amount of sense. The technical reason to put memory and the CPU/GPU on a single package is to decrease memory latency (and possibly boost operating frequency). There's probably also the option to have a really wide memory bus.
Of course, a nice side benefit for Apple is that people can no longer buy 3rd part ram at market rates.
That being said...
> My M1 MacBook is significantly "faster" than my previous Intel 16", even though the per-core performance are roughly similar in CPU benchmarks (small advantage to M1).
The thing about benchmarks is, they're sometimes particular to a specific application, and even data sets. So M1 can be largely tied with an Intel chip across a broad range of benchmarks (e.g. Geekbench), but it can also be a lot faster on your specific workloads.
It's due to larger caches, memory bandwidth, and thermal throttling or lack thereof. Intel laptops tend to run hot and therefore tend to not reach peak performance for longer periods of time. With M1/M2 and especially with the pro laptops with fans you will get peak performance much or all of the time.
It doesn't matter what SM does if good ARM processors aren't available in large quantities and for competitive prices. The article didn't really address that.
It's clear that Apple and Amazon have their own chips, so will the SM hardware just be sold to Amazon? (I assume Apple has no use for them.)
It's all connected. More demand = more supply, even if not immediate. SM building more Arm servers does actually matter for bigger Arm production/availability.
If it was my call I'd skip ARM and go straight to Risc-V. Arms days are numbered - they have reached their peak and will likely stop growing and show a gradual decline as Risc-V becomes increasingly main stream.
the vast majority of the difficulty of building something isn't the instruction set but all the various IP required to build a complex and performant chip. For ARM there's a large ecosystem of IP for almost everything. It's great that RISC-V is an open instruction set but it will be a while before the right IP made and licensable in RISC-V.
[+] [-] PreInternet01|3 years ago|reply
But I'm not exactly holding my breath. Gigabyte is another not-entirely-unserious OEM that previously did lots and lots of ARM press releases. And if you look at https://www.gigabyte.com/us/Enterprise/ARM-Server you may actually be impressed.
But, trying to order one of these things is an entirely different matter. Getting pricing, let alone delivery dates, is just impossible, and if you look closer at aforementioned web page, you'll see that the 'show SKUs' button on many models is simply missing.
And this has been going on even before the latest COVID-induced supply-chain crisis, by the way. Getting your hands on useful ARM server hardware has always been plain impossible (unless, I guess, your're AWS, Microsoft or Google). So, while I'm hopeful that this announcement will improve things, I'm not exactly optimistic...
[+] [-] firmnoodle|3 years ago|reply
[+] [-] jasoneckert|3 years ago|reply
[+] [-] AlexThrowAway|3 years ago|reply
Could be a long wait.
Nuvia promised silicon in mid 2022. It's now January 2023 and no updates from Qualcomm on when they're going to start sampling to OEMs.
[+] [-] motiejus|3 years ago|reply
We don't have performance numbers to show yet (and not sure if we will be able to publish them when we will), but I can say I am pleasantly surprised with how much cross-compiling is easier than it was ~10 years ago. Especially with zig cc. :)
[1]: https://jakstys.lt/2022/how-uber-uses-zig/
[+] [-] fweimer|3 years ago|reply
I tend to treat the desire to cross-compile as a rather strong indicator that the architecture is just not viable for general-purpose computing. Either because it's impossible to get actual hardware at all, or the hardware doesn't have decent specs. In both cases, it's going to be difficult find a business justification for building for this architecture (again for general-purpose computing).
Obviously, architecture bringup is an exception, but we are way past that for aarch64.
[+] [-] aidenn0|3 years ago|reply
[+] [-] exabrial|3 years ago|reply
I saw a previous HN comment about this being due to memory bandwidth and cache latency, but I can't seem to substantiate that comment.
That being said, I'd love to have this sort of performance out of servers. Arm released https://www.anandtech.com/show/17575/arm-announces-neoverse-... in 2022, but it's not really available yet anywhere.
[+] [-] dragontamer|3 years ago|reply
Nominally speaking, its inefficient at that point. You pretty much can make 2 cores fit inside of the M1 core, and each x86 core supports two threads (Would you rather have 4x x86 threads, or 1x M1 core??). I'm intrigued that people continue to find benefits from such a large core (even without SMT / Hyperthreading).
--------
But yes, M1 has huge L1 cache, huge reorder buffers, and extremely wide execution. I'd expect it to win clock-for-clock vs any other core in the market.
But I'm not fully convinced that its the best design / tradeoff. Intel's E-cores + P-cores suggests that modern CPU cores may have become overly big.
[+] [-] FredPret|3 years ago|reply
Some workloads are obviously faster on the big PC, but I haven't been this excited about a CPU since the Pentium 1.
[+] [-] wil421|3 years ago|reply
I’m going to some personal benchmarks for all all 3 machines plus my new Ryzen 7950x. It still amazes me the fanless M1 Air is nearly as fast as my old desktop and laptop.
[+] [-] zX41ZdbW|3 years ago|reply
You can compare Threadripper (4-channel memory), Threadripper Pro (8-channel memory, same as for EPYC servers) and M1 by memory bandwidth.
Example 1: https://www.phoronix.com/review/amd-threadripper-5965wx/5 - ClickHouse on Threadripper Pro with less number of cores but better memory bandwidth is faster than Threadripper with larger number of cores.
Example 2: https://benchmark.clickhouse.com/hardware/ various servers, while Aarch64 servers have good places, the top results are from EPYC servers. Never EPYC servers have 12-channel DDR-5 memory - sounds like a heaven for ClickHouse :)
[+] [-] nordsieck|3 years ago|reply
I don't know if Apple has released any stats, but it makes a certain amount of sense. The technical reason to put memory and the CPU/GPU on a single package is to decrease memory latency (and possibly boost operating frequency). There's probably also the option to have a really wide memory bus.
Of course, a nice side benefit for Apple is that people can no longer buy 3rd part ram at market rates.
That being said...
> My M1 MacBook is significantly "faster" than my previous Intel 16", even though the per-core performance are roughly similar in CPU benchmarks (small advantage to M1).
The thing about benchmarks is, they're sometimes particular to a specific application, and even data sets. So M1 can be largely tied with an Intel chip across a broad range of benchmarks (e.g. Geekbench), but it can also be a lot faster on your specific workloads.
[+] [-] api|3 years ago|reply
[+] [-] e40|3 years ago|reply
It's clear that Apple and Amazon have their own chips, so will the SM hardware just be sold to Amazon? (I assume Apple has no use for them.)
[+] [-] AtlasBarfed|3 years ago|reply
[+] [-] viraptor|3 years ago|reply
[+] [-] znpy|3 years ago|reply
[+] [-] spamizbad|3 years ago|reply
[+] [-] wmf|3 years ago|reply
[+] [-] Gys|3 years ago|reply
Oracle even has a free forever tier with ARM: https://www.oracle.com/cloud/free/
[+] [-] throwawaymaths|3 years ago|reply
[+] [-] rjsw|3 years ago|reply
[1] https://en.wikipedia.org/wiki/AWS_Graviton
[+] [-] bfgoodrich|3 years ago|reply
[deleted]
[+] [-] pwdisswordfish9|3 years ago|reply
[+] [-] strangattractor|3 years ago|reply
[+] [-] strangattractor|3 years ago|reply
https://www.mips.com/products/risc-v/
[+] [-] booi|3 years ago|reply
[+] [-] zokula|3 years ago|reply
[deleted]
[+] [-] oriel|3 years ago|reply
I have them on my mental blacklist since the reveal they were injecting backdoors into their hardware.
I also dont trust shell games under the label of "changing suppliers" when supposedly the hardware backdoors were added on US soil.
Most references happened in May 2019, but theres one from 2021:
- https://www.bloomberg.com/features/2021-supermicro/?leadSour...