top | item 34245255

Supermicro throws its weight behind Arm servers

127 points| rbanffy | 3 years ago |nextplatform.com | reply

135 comments

order
[+] PreInternet01|3 years ago|reply
Supermicro is a pretty serious OEM, and if they actually manage to make ARM servers available (which means, in Supermicro parlance, as Global SKUs -boxes you can actually buy from distributors, not the wishy-washy 'contact us for details' stuff that is all over their website lately-), that would be a great win for the ecosystem.

But I'm not exactly holding my breath. Gigabyte is another not-entirely-unserious OEM that previously did lots and lots of ARM press releases. And if you look at https://www.gigabyte.com/us/Enterprise/ARM-Server you may actually be impressed.

But, trying to order one of these things is an entirely different matter. Getting pricing, let alone delivery dates, is just impossible, and if you look closer at aforementioned web page, you'll see that the 'show SKUs' button on many models is simply missing.

And this has been going on even before the latest COVID-induced supply-chain crisis, by the way. Getting your hands on useful ARM server hardware has always been plain impossible (unless, I guess, your're AWS, Microsoft or Google). So, while I'm hopeful that this announcement will improve things, I'm not exactly optimistic...

[+] firmnoodle|3 years ago|reply
Totally agree. While it's hard to get hands on a physical box, you can always get bare metal Ampere servers from the Packet guys aka equinixmetal.com. They have them in a few regions.
[+] jasoneckert|3 years ago|reply
Ditto for trying to get an Ampere-based HP ProLiant RL Gen11. After trying to get an actual quote from HP for a month, we gave up thinking that the product was vapourware.
[+] AlexThrowAway|3 years ago|reply
>But, trying to order one of these things is an entirely different matter. Getting pricing, let alone delivery dates, is just impossible

Could be a long wait.

Nuvia promised silicon in mid 2022. It's now January 2023 and no updates from Qualcomm on when they're going to start sampling to OEMs.

[+] motiejus|3 years ago|reply
I have been working to move Uber's stack to arm64 during 2022[1]. We are quite far ahead. I can say the ecosystem to compile stuff is pretty good, at least for our primary targets C++ and Go.

We don't have performance numbers to show yet (and not sure if we will be able to publish them when we will), but I can say I am pleasantly surprised with how much cross-compiling is easier than it was ~10 years ago. Especially with zig cc. :)

[1]: https://jakstys.lt/2022/how-uber-uses-zig/

[+] fweimer|3 years ago|reply
Do you cross-compile from aarch64 or to aarch64?

I tend to treat the desire to cross-compile as a rather strong indicator that the architecture is just not viable for general-purpose computing. Either because it's impossible to get actual hardware at all, or the hardware doesn't have decent specs. In both cases, it's going to be difficult find a business justification for building for this architecture (again for general-purpose computing).

Obviously, architecture bringup is an exception, but we are way past that for aarch64.

[+] aidenn0|3 years ago|reply
If the Zig language never goes anywhere, at least zig cc has shown people how much better the GNU toolchain could be.
[+] exabrial|3 years ago|reply
My M1 MacBook is significantly "faster" than my previous Intel 16", even though the per-core performance are roughly similar in CPU benchmarks (small advantage to M1).

I saw a previous HN comment about this being due to memory bandwidth and cache latency, but I can't seem to substantiate that comment.

That being said, I'd love to have this sort of performance out of servers. Arm released https://www.anandtech.com/show/17575/arm-announces-neoverse-... in 2022, but it's not really available yet anywhere.

[+] dragontamer|3 years ago|reply
M1 is a bigger core than any other core I know of. Just larger reorder buffers, wider execution, etc. etc.

Nominally speaking, its inefficient at that point. You pretty much can make 2 cores fit inside of the M1 core, and each x86 core supports two threads (Would you rather have 4x x86 threads, or 1x M1 core??). I'm intrigued that people continue to find benefits from such a large core (even without SMT / Hyperthreading).

--------

But yes, M1 has huge L1 cache, huge reorder buffers, and extremely wide execution. I'd expect it to win clock-for-clock vs any other core in the market.

But I'm not fully convinced that its the best design / tradeoff. Intel's E-cores + P-cores suggests that modern CPU cores may have become overly big.

[+] FredPret|3 years ago|reply
I had one of those behemoth AMD dozen-core CPU's with 64gb of RAM, in a desktop. My basic M1 blows it out of the water in terms of how fast it feels.

Some workloads are obviously faster on the big PC, but I haven't been this excited about a CPU since the Pentium 1.

[+] wil421|3 years ago|reply
My M1 air is only a few minutes behind my old i7-7700k in handbrake encoding a 4k 10 bit HDR movie into a 1080p version. I still have a 2017 16inch MBP that throttles horribly on any major task.

I’m going to some personal benchmarks for all all 3 machines plus my new Ryzen 7950x. It still amazes me the fanless M1 Air is nearly as fast as my old desktop and laptop.

[+] zX41ZdbW|3 years ago|reply
It can be about the memory bandwidth, but it matters mostly for hi-end data processing software.

You can compare Threadripper (4-channel memory), Threadripper Pro (8-channel memory, same as for EPYC servers) and M1 by memory bandwidth.

Example 1: https://www.phoronix.com/review/amd-threadripper-5965wx/5 - ClickHouse on Threadripper Pro with less number of cores but better memory bandwidth is faster than Threadripper with larger number of cores.

Example 2: https://benchmark.clickhouse.com/hardware/ various servers, while Aarch64 servers have good places, the top results are from EPYC servers. Never EPYC servers have 12-channel DDR-5 memory - sounds like a heaven for ClickHouse :)

[+] nordsieck|3 years ago|reply
> I saw a previous HN comment about this being due to memory bandwidth and cache latency, but I can't seem to substantiate that comment.

I don't know if Apple has released any stats, but it makes a certain amount of sense. The technical reason to put memory and the CPU/GPU on a single package is to decrease memory latency (and possibly boost operating frequency). There's probably also the option to have a really wide memory bus.

Of course, a nice side benefit for Apple is that people can no longer buy 3rd part ram at market rates.

That being said...

> My M1 MacBook is significantly "faster" than my previous Intel 16", even though the per-core performance are roughly similar in CPU benchmarks (small advantage to M1).

The thing about benchmarks is, they're sometimes particular to a specific application, and even data sets. So M1 can be largely tied with an Intel chip across a broad range of benchmarks (e.g. Geekbench), but it can also be a lot faster on your specific workloads.

[+] api|3 years ago|reply
It's due to larger caches, memory bandwidth, and thermal throttling or lack thereof. Intel laptops tend to run hot and therefore tend to not reach peak performance for longer periods of time. With M1/M2 and especially with the pro laptops with fans you will get peak performance much or all of the time.
[+] e40|3 years ago|reply
It doesn't matter what SM does if good ARM processors aren't available in large quantities and for competitive prices. The article didn't really address that.

It's clear that Apple and Amazon have their own chips, so will the SM hardware just be sold to Amazon? (I assume Apple has no use for them.)

[+] AtlasBarfed|3 years ago|reply
Wouldn't qualcomm happily supply mobile CPUs, which are basically what you need for a datacenter aside from slapping bigger caches on them?
[+] viraptor|3 years ago|reply
It's all connected. More demand = more supply, even if not immediate. SM building more Arm servers does actually matter for bigger Arm production/availability.
[+] znpy|3 years ago|reply
I can’t wait to be able to buy these things used
[+] spamizbad|3 years ago|reply
How widespread are ARM servers outside of hyperscalers who naturally wouldn't touch anything Supermicro makes with a 10-foot pole?
[+] wmf|3 years ago|reply
They're not widespread yet because few vendors sell them. Supermicro and HPE give ARM servers more credibility.
[+] Gys|3 years ago|reply
Oracle Cloud offers ARM servers from Ampere, the same brand as Supermicro will offer.

Oracle even has a free forever tier with ARM: https://www.oracle.com/cloud/free/

[+] throwawaymaths|3 years ago|reply
Why wouldn't they touch anything supermicro makes?
[+] strangattractor|3 years ago|reply
If it was my call I'd skip ARM and go straight to Risc-V. Arms days are numbered - they have reached their peak and will likely stop growing and show a gradual decline as Risc-V becomes increasingly main stream.
[+] booi|3 years ago|reply
the vast majority of the difficulty of building something isn't the instruction set but all the various IP required to build a complex and performant chip. For ARM there's a large ecosystem of IP for almost everything. It's great that RISC-V is an open instruction set but it will be a while before the right IP made and licensable in RISC-V.
[+] oriel|3 years ago|reply
Serious question: does anyone still trust supermicro?

I have them on my mental blacklist since the reveal they were injecting backdoors into their hardware.

I also dont trust shell games under the label of "changing suppliers" when supposedly the hardware backdoors were added on US soil.

Most references happened in May 2019, but theres one from 2021:

- https://www.bloomberg.com/features/2021-supermicro/?leadSour...