Hyperscale in your Homelab: The Compute Blade arrives

[+] spiritplumber|3 years ago|reply

I helped write parallelknoppix when I was an undergrad - our university's 2nd cluster ended up being a bunch of laptops with broken displays running it. Took me a whole summer.

Then the next semester I am denied the ability to take a parallel computing class because it was for graduate students only and the prof. would not accept a waiver even though the class was being taught on the cluster me and a buddy built.

That I still had root on.

So I added a script that would renice the prof.'s jobs to be as slow as possible.

BOFH moment :)

[+] ilyt|3 years ago|reply

I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc) but lack of ECC and OOB management kinda disqualifies it for anything serious.

>He's running forty Blades in 2U. That's:

    >
    >      160 ARM cores
    >      320 GB of RAM
    >      (up to) 320 terabytes of flash storage
    >

>...in 2U of rackspace.

Yay that's like... almost as much as normal 1U server can do

Edit: I give up, HN formatting is idiotic

[+] xattt|3 years ago|reply

But does anyone remember the Beowulf trope(1) from Slashdot? Am I a greybeard now?

(1) https://hardware.slashdot.org/story/01/07/14/0748215/can-you...

[+] marginalia_nu|3 years ago|reply

I do think this is sort of fool's gold in terms of actual performance. Even though the core count and RAM size is impressive, those cores are talking over ethernet rather than system bus.

Latency and bandwidth is atrocious in comparison, and you're going to run into problems like no individual memory allocation being able to exceed 8 Gb.

Like for running a hundred truly independent jobs then sure, maybe you'll get equivalent performance, but that's a very unique scenario that is rare in the real world.

[+] singron|3 years ago|reply

It's actually 3U since the 2U of 40 pis will need almost an entire 1U 48 port PoE switch instead of plugging into the TOR. The switch will use 35-100W for itself depending on features and conversion losses. If each pi uses more than 8-9W or so under load, then you might actually need a second PoE switch.

If you are building full racks, it probably makes more sense to use ordinary systems, but if you want to have a lot of actual hardware isolation at a smaller scale, it could make sense.

In some colos, they don't give you enough power to fill up your racks, so the low energy density wouldn't be such a bummer there.

[+] 2OEH8eoCRo0|3 years ago|reply

> Yay that's like... almost as much as normal 1U server can do

Hyperscale in your Homelab. Something to hack on, learn, host things like Jellyfin, and have fun with.

[+] guntherhermann|3 years ago|reply

It's ~~four~~ two spaces to get the "code block" style.

    like
    this

and asterisk for italics (I don't think there is a 'quote' available, and I'm not sure how they play together.

* does this work? * Edit: No! Haha

    *how*
    *about*
    *this*

Edit: No, no joy there either.

I agree, it's not the most intuitive formatting syntax I've come across :)

I guess we're stuck with BEGIN_QUOTE and END_QUOTE blocks!

[+] PragmaticPulp|3 years ago|reply

> but lack of ECC and OOB management kinda disqualifies it for anything serious.

> Yay that's like... almost as much as normal 1U server can do

It’s a fun toy. Obviously it isn’t the best or most efficient way to get any job done. That’s not the point.

Enjoy it for the fun experiment that it is.

[+] LeonM|3 years ago|reply

> I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc)

You don't need a cluster for that, even a 1st gen Pi can run those services without any problem.

[+] sys42590|3 years ago|reply

Indeed, that box here next to my desk draws 50W of electricity continuously despite being mostly idle. Why? Because it has ECC.

Having some affordable low power device with ECC would be a game changer for me.

I added affordable to exclude expensive (and noisy) workstation class laptops with ECC RAM.

[+] imtringued|3 years ago|reply

Didn't AMD announce a 96 core processor with dual socket support?

As usual this is either done for entertainment value or to simulate physical networks (not clusters).

[+] FlyingAvatar|3 years ago|reply

I think the hardware isolation would be a selling point in some cases. Granted, it's niche.

[+] goodpoint|3 years ago|reply

> Yay that's like... almost as much as normal 1U server can do

...but the normal server is much cheaper.

[+] mkl|3 years ago|reply

From the FAQ: https://news.ycombinator.com/formatdoc

[+] metalspot|3 years ago|reply

its a nice hobby project, but of course a commercial blade system will have far higher compute density. supermicro can do 20 epyc nodes in 8u, which at 64 cores per node is 1280 cores in 8u, or 160 in 1u, so double the core density, and far more powerful cores, so way higher effective compute density.

[+] timerol|3 years ago|reply

Also not noted: 320 TB in 40 M.2 drives will be extremely expensive. Newegg doesn't have any 8 TB M.2 SSDs under $1000. $0.12/GB is about twice as expensive as more normally-sized drives, to say nothing of the price of spinning rust.

[+] guntherhermann|3 years ago|reply

> Yay that's like... almost as much as normal 1U server can do

What about cost, and other metrics around cost (power usage, reliability)? If space is the only factor we care about then it seems like a loss.

[+] bee_rider|3 years ago|reply

Just the Pi’s are $35 a pop, right? So that’s $1400 of Pi’s, on top of whatever the rest of the stuff costs. Wonder how it compares to, I guess, a whatever the price equivalent AMD workstation chip is…

[+] actually_a_dog|3 years ago|reply

What are you talking about "lack of ECC?" The Pi4 has ECC.

[+] mayli|3 years ago|reply

That would be 40x (Rpi4 8GB $75 + 8TB nvme $1200 + psu and others) ~ $51000.

[+] MuffinFlavored|3 years ago|reply

> lack of ECC and OOB management kinda disqualifies it

Can you expand on this please?

[+] unknown|3 years ago|reply

[deleted]

[+] zaarn|3 years ago|reply

The 1U server is however likely to use more than 200 Watts of power that the 40 Blade 2U setup would use.

[+] alex_suzuki|3 years ago|reply

Ah, do you feel it too? That need to own some of these, even though you have zero actual use for them.

[+] petesergeant|3 years ago|reply

Nothing generates that feeling for me like seeing these things:

https://store.planetcom.co.uk/products/gemini-pda-1

I absolutely can't imagine what I'd use it for, and yet, my finger has hovered over "buy" many many times over the last few years

[+] fy20|3 years ago|reply

I think I could justify the world's most secure and reliable Home Assistant cluster with automatic failover...

[+] Hamuko|3 years ago|reply

I don't feel like I have zero actual use for them. The amount of Docker containers I have running on my NAS is only ever going up. These could make for a nice, expandable Kubernetes cluster.

As for if that's a good use-case is a whole another thing.

[+] ChuckMcM|3 years ago|reply

That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back. Connect them to a jelly bean switch chip (8 port GbE are like $8 in qty) Signal integrity on, at most 4" of PCB trace should not be a problem. You could bring the network "port status" lines to the front if you're interested in seeing the blinky lights of network traffic.

The big win here would be that all of the network wiring is "built in" and compact. Blade replacement it trivial.

Have your fans blow up from the bottom and stagger "slots" on each row and if you do 32 slots per row, you probably build a kilocore cluster in a 6U box.

Ah the fun I would have with a lab with an nice budget.

[+] zokier|3 years ago|reply

> That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back

That stuck out to me too, they are making custom boards and custom chassis, surely it would be cleaner to route the networking and power through backplane instead of having gazillion tiny patch cables and random switch just hanging in there. Could also avoid the need for PoE by just having power buses in the backplane.

Overall imho the point of blades is that some stuff gets offloaded to the chassis, but here the chassis doesn't seem to be doing much at all.

[+] themoonisachees|3 years ago|reply

Couldn't you do 1ki cores /4U with just Epyc CPUs in normal servers? At that point surely for cheaper, also significantly easier to build, and faster since the cores don't talk over Ethernet?

[+] sitkack|3 years ago|reply

> jelly bean switch chip

What do you have in mind? I couldn't find this part. Really am asking.

[+] nine_k|3 years ago|reply

What kind of fun might that be?

[+] thejosh|3 years ago|reply

These would be awesome for build servers, and testing.

I really like Graviton from AWS, and Apple Silicon is great, I really hope we move towards ARM64 more. ArchLinux has https://archlinuxarm.org , I would love to use these to build and test arm64 packages (without needing to use qemu hackery, awesome though that it is).

[+] Aissen|3 years ago|reply

Multiple server vendors now have Ampere offerings. In 2U, you can have:

* 4 Ampere Altra Max processors (in 2 or 4 servers), so about 512 cores, and much faster than anything those Raspberry Pi have.

* lots of RAM, probably about 4TB ?

* ~92TB of flash storage (or more ?)

Edit : I didn't want to disparage the compute blade, it looks like a very fun project. It's not even the same use case as the server hardware (and probably the best solution if you need actual raspberry pis), the only common thread is the 2U and rack use.

[+] davgoldin|3 years ago|reply

This looks very promising. I basically could print an enclosure to specifically fit my home space. And easily print a new one when I move.

More efficient use of space compared to my current silent mini-home lab -- also about 2U worth of space, but stacked semi-vertically [1].

That's 4 servers each with AMD 5950x, 128GB ECC, 2TB NVMe, 2x8TB SSD (64c/512GB/72TB total).

[1] https://ibb.co/Jm1SX7d

[+] blitzar|3 years ago|reply

The blade has arrived but can you get a compute unit to go in it? The non availability of the whole pi ecosystem has done a lot of damage.

[+] wildekek|3 years ago|reply

I have this cycle every 10 years where my home infra gets to enterprise level complexity (virtualisation/redundancy/HA) until the maintenance is more work than the joy it brings. Then, after some outage that took me way too long to fix, I decide it is over and I reduce everything down to a single modem/router and WiFi AP. I feel the pull to buy this and create a glorious heap of complexity to run my doorbell on and be disapointed, can't wait.

[+] aseipp|3 years ago|reply

I love the form factor. But please. For the love of god. We need something with wide availability that supports at least ARMv8.2.

At this rate I have so little hope in other vendors that we'll probably just have to wait for the RPi5.

[+] walrus01|3 years ago|reply

If you want "hyperscale" in your homelab, the bare metal hypervisor needs to be x86-64 because unless you literally work for Amazon or a few others you are unlikely to be able to purchase other competitively priced and speedy arm based servers.

There is still near zero availability in mass market for CPUs you can stick into motherboards from one of the top ten taiwanese vendors of serious server class motherboards.

And don't even get me started on the lack of ability to actually buy raspberry pi of your desired configuration at a reasonable price and in stock to hit add to cart.

[+] vegardx|3 years ago|reply

Supermicro launched a whole lineup of ARM-based servers last fall. They seem to mostly offer complete systems for now, but as far as I understand that's mostly because there's still some minor issues to iron out in terms of broader support.

[+] onphonenow|3 years ago|reply

I’ve been getting good price/perf just doing the top AMD consumer CPU’s. Wish someone would make an AM5 platform motherboard with out of band / remote console mgmt. that really is a must if you have a bunch of boxes and have them somewhere else. The per core speeds are high on these. 16 core / 32 threads/boxe gets you enough for a fair bit.

[+] Havoc|3 years ago|reply

I’ve built a small rasp k3s cluster with pi4 and ssd. It works fine but one can ultimately still feel that they are quite weak. Or put differently deploying something on k3s still ends up deploying on a single node in most cases and this gets single node performance under most circumstances

[+] eismcc|3 years ago|reply

It's amazing to see how far these systems have come since my coverage from The Verge in 2014, where I built a multi-node Parallella cluster. The main problem I had then was that there was no of the shelf GPU friendly library to run on it, so I ended up working with the Gray Chapel project to get some distributed vectorization support. Of course, that's all changed now.

https://www.theverge.com/2014/6/4/5779468/twitter-engineer-b...

[+] lars-b2018|3 years ago|reply

It's not clear to me how to build a business based on RPi availability. And the clones don't seem to be really in the game. Are Raspberry Pis becoming more readily available? I don't see that.

[+] exabrial|3 years ago|reply

I really want something like NVidia's upcoming Grace CPU in blade format, but something where I can provision a chunk of SSD storage off a SAN via some sort of PCI-E backplane. Same form factor like the linked project.

I'm noticing that our JVM workloads execute _significantly_ faster on ARM. Just looking at the execution times on our lowly first-gen M1s Macbooks is significantly better than some of our best Intel or AMD hardware we have racked. I'm guessing it all has to do with Memory bandwidth.

[+] sroussey|3 years ago|reply

Apple should go with a blade design for the Mac Pro. Just stick in as many M2 Ultra blades as you need to up the compute and memory.

Will need to deal with NUMA issues on the software side.

[+] geerlingguy|3 years ago|reply

I would be all over any server like form factor for M-series chips. The efficiency numbers for the CPU are great.

[+] nubinetwork|3 years ago|reply

I have a few armada 8040 boards, and a couple raspberry pi's, but lets be real...

They're not going to get maximum performance from a nvme disk, the cpus are too slow, and gigabit isn't going to cut it for high throughput applications.

Until manufacturers start shipping boards with ~32 cores clocked faster than 2ghz and multiple 10gbit connections, they're nothing more than a fun nerd toy.

[+] robbiet480|3 years ago|reply

Been waiting for this for over a year, was the first person to buy a pre-purchase sample. Planning to set up a PXE k3s cluster.

[+] pnathan|3 years ago|reply

This looks cool!

I would, however, say that while I'm in the general target audience, I won't do crowdfunded hardware. If it isn't actually being produced, I won't buy it. The road between prototype and production is a long one for hardware.

(Still waiting for a very cool bit of hardware, 3+ years later - suspecting that project is just *dead*)

224 comments