I helped write parallelknoppix when I was an undergrad - our university's 2nd cluster ended up being a bunch of laptops with broken displays running it. Took me a whole summer.
Then the next semester I am denied the ability to take a parallel computing class because it was for graduate students only and the prof. would not accept a waiver even though the class was being taught on the cluster me and a buddy built.
That I still had root on.
So I added a script that would renice the prof.'s jobs to be as slow as possible.
I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc) but lack of ECC and OOB management kinda disqualifies it for anything serious.
>He's running forty Blades in 2U. That's:
>
> 160 ARM cores
> 320 GB of RAM
> (up to) 320 terabytes of flash storage
>
>...in 2U of rackspace.
Yay that's like... almost as much as normal 1U server can do
I do think this is sort of fool's gold in terms of actual performance. Even though the core count and RAM size is impressive, those cores are talking over ethernet rather than system bus.
Latency and bandwidth is atrocious in comparison, and you're going to run into problems like no individual memory allocation being able to exceed 8 Gb.
Like for running a hundred truly independent jobs then sure, maybe you'll get equivalent performance, but that's a very unique scenario that is rare in the real world.
It's actually 3U since the 2U of 40 pis will need almost an entire 1U 48 port PoE switch instead of plugging into the TOR. The switch will use 35-100W for itself depending on features and conversion losses. If each pi uses more than 8-9W or so under load, then you might actually need a second PoE switch.
If you are building full racks, it probably makes more sense to use ordinary systems, but if you want to have a lot of actual hardware isolation at a smaller scale, it could make sense.
In some colos, they don't give you enough power to fill up your racks, so the low energy density wouldn't be such a bummer there.
its a nice hobby project, but of course a commercial blade system will have far higher compute density. supermicro can do 20 epyc nodes in 8u, which at 64 cores per node is 1280 cores in 8u, or 160 in 1u, so double the core density, and far more powerful cores, so way higher effective compute density.
Also not noted: 320 TB in 40 M.2 drives will be extremely expensive. Newegg doesn't have any 8 TB M.2 SSDs under $1000. $0.12/GB is about twice as expensive as more normally-sized drives, to say nothing of the price of spinning rust.
Just the Pi’s are $35 a pop, right? So that’s $1400 of Pi’s, on top of whatever the rest of the stuff costs. Wonder how it compares to, I guess, a whatever the price equivalent AMD workstation chip is…
I don't feel like I have zero actual use for them. The amount of Docker containers I have running on my NAS is only ever going up. These could make for a nice, expandable Kubernetes cluster.
As for if that's a good use-case is a whole another thing.
That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back. Connect them to a jelly bean switch chip (8 port GbE are like $8 in qty) Signal integrity on, at most 4" of PCB trace should not be a problem. You could bring the network "port status" lines to the front if you're interested in seeing the blinky lights of network traffic.
The big win here would be that all of the network wiring is "built in" and compact. Blade replacement it trivial.
Have your fans blow up from the bottom and stagger "slots" on each row and if you do 32 slots per row, you probably build a kilocore cluster in a 6U box.
Ah the fun I would have with a lab with an nice budget.
> That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back
That stuck out to me too, they are making custom boards and custom chassis, surely it would be cleaner to route the networking and power through backplane instead of having gazillion tiny patch cables and random switch just hanging in there. Could also avoid the need for PoE by just having power buses in the backplane.
Overall imho the point of blades is that some stuff gets offloaded to the chassis, but here the chassis doesn't seem to be doing much at all.
Couldn't you do 1ki cores /4U with just Epyc CPUs in normal servers? At that point surely for cheaper, also significantly easier to build, and faster since the cores don't talk over Ethernet?
These would be awesome for build servers, and testing.
I really like Graviton from AWS, and Apple Silicon is great, I really hope we move towards ARM64 more. ArchLinux has https://archlinuxarm.org , I would love to use these to build and test arm64 packages (without needing to use qemu hackery, awesome though that it is).
Multiple server vendors now have Ampere offerings. In 2U, you can have:
* 4 Ampere Altra Max processors (in 2 or 4 servers), so about 512 cores, and much faster than anything those Raspberry Pi have.
* lots of RAM, probably about 4TB ?
* ~92TB of flash storage (or more ?)
Edit : I didn't want to disparage the compute blade, it looks like a very fun project. It's not even the same use case as the server hardware (and probably the best solution if you need actual raspberry pis), the only common thread is the 2U and rack use.
I have this cycle every 10 years where my home infra gets to enterprise level complexity (virtualisation/redundancy/HA) until the maintenance is more work than the joy it brings. Then, after some outage that took me way too long to fix, I decide it is over and I reduce everything down to a single modem/router and WiFi AP.
I feel the pull to buy this and create a glorious heap of complexity to run my doorbell on and be disapointed, can't wait.
If you want "hyperscale" in your homelab, the bare metal hypervisor needs to be x86-64 because unless you literally work for Amazon or a few others you are unlikely to be able to purchase other competitively priced and speedy arm based servers.
There is still near zero availability in mass market for CPUs you can stick into motherboards from one of the top ten taiwanese vendors of serious server class motherboards.
And don't even get me started on the lack of ability to actually buy raspberry pi of your desired configuration at a reasonable price and in stock to hit add to cart.
Supermicro launched a whole lineup of ARM-based servers last fall. They seem to mostly offer complete systems for now, but as far as I understand that's mostly because there's still some minor issues to iron out in terms of broader support.
I’ve been getting good price/perf just doing the top AMD consumer CPU’s. Wish someone would make an AM5 platform motherboard with out of band / remote console mgmt. that really is a must if you have a bunch of boxes and have them somewhere else. The per core speeds are high on these. 16 core / 32 threads/boxe gets you enough for a fair bit.
I’ve built a small rasp k3s cluster with pi4 and ssd. It works fine but one can ultimately still feel that they are quite weak. Or put differently deploying something on k3s still ends up deploying on a single node in most cases and this gets single node performance under most circumstances
It's amazing to see how far these systems have come since my coverage from The Verge in 2014, where I built a multi-node Parallella cluster. The main problem I had then was that there was no of the shelf GPU friendly library to run on it, so I ended up working with the Gray Chapel project to get some distributed vectorization support. Of course, that's all changed now.
It's not clear to me how to build a business based on RPi availability. And the clones don't seem to be really in the game. Are Raspberry Pis becoming more readily available? I don't see that.
I really want something like NVidia's upcoming Grace CPU in blade format, but something where I can provision a chunk of SSD storage off a SAN via some sort of PCI-E backplane. Same form factor like the linked project.
I'm noticing that our JVM workloads execute _significantly_ faster on ARM. Just looking at the execution times on our lowly first-gen M1s Macbooks is significantly better than some of our best Intel or AMD hardware we have racked. I'm guessing it all has to do with Memory bandwidth.
I have a few armada 8040 boards, and a couple raspberry pi's, but lets be real...
They're not going to get maximum performance from a nvme disk, the cpus are too slow, and gigabit isn't going to cut it for high throughput applications.
Until manufacturers start shipping boards with ~32 cores clocked faster than 2ghz and multiple 10gbit connections, they're nothing more than a fun nerd toy.
I would, however, say that while I'm in the general target audience, I won't do crowdfunded hardware. If it isn't actually being produced, I won't buy it. The road between prototype and production is a long one for hardware.
(Still waiting for a very cool bit of hardware, 3+ years later - suspecting that project is just *dead*)
[+] [-] spiritplumber|3 years ago|reply
Then the next semester I am denied the ability to take a parallel computing class because it was for graduate students only and the prof. would not accept a waiver even though the class was being taught on the cluster me and a buddy built.
That I still had root on.
So I added a script that would renice the prof.'s jobs to be as slow as possible.
BOFH moment :)
[+] [-] ilyt|3 years ago|reply
>He's running forty Blades in 2U. That's:
>...in 2U of rackspace.Yay that's like... almost as much as normal 1U server can do
Edit: I give up, HN formatting is idiotic
[+] [-] xattt|3 years ago|reply
(1) https://hardware.slashdot.org/story/01/07/14/0748215/can-you...
[+] [-] marginalia_nu|3 years ago|reply
Latency and bandwidth is atrocious in comparison, and you're going to run into problems like no individual memory allocation being able to exceed 8 Gb.
Like for running a hundred truly independent jobs then sure, maybe you'll get equivalent performance, but that's a very unique scenario that is rare in the real world.
[+] [-] singron|3 years ago|reply
If you are building full racks, it probably makes more sense to use ordinary systems, but if you want to have a lot of actual hardware isolation at a smaller scale, it could make sense.
In some colos, they don't give you enough power to fill up your racks, so the low energy density wouldn't be such a bummer there.
[+] [-] 2OEH8eoCRo0|3 years ago|reply
Hyperscale in your Homelab. Something to hack on, learn, host things like Jellyfin, and have fun with.
[+] [-] guntherhermann|3 years ago|reply
* does this work? * Edit: No! Haha
Edit: No, no joy there either.I agree, it's not the most intuitive formatting syntax I've come across :)
I guess we're stuck with BEGIN_QUOTE and END_QUOTE blocks!
[+] [-] PragmaticPulp|3 years ago|reply
> Yay that's like... almost as much as normal 1U server can do
It’s a fun toy. Obviously it isn’t the best or most efficient way to get any job done. That’s not the point.
Enjoy it for the fun experiment that it is.
[+] [-] LeonM|3 years ago|reply
You don't need a cluster for that, even a 1st gen Pi can run those services without any problem.
[+] [-] sys42590|3 years ago|reply
Having some affordable low power device with ECC would be a game changer for me.
I added affordable to exclude expensive (and noisy) workstation class laptops with ECC RAM.
[+] [-] imtringued|3 years ago|reply
As usual this is either done for entertainment value or to simulate physical networks (not clusters).
[+] [-] FlyingAvatar|3 years ago|reply
[+] [-] goodpoint|3 years ago|reply
...but the normal server is much cheaper.
[+] [-] mkl|3 years ago|reply
[+] [-] metalspot|3 years ago|reply
[+] [-] timerol|3 years ago|reply
[+] [-] guntherhermann|3 years ago|reply
What about cost, and other metrics around cost (power usage, reliability)? If space is the only factor we care about then it seems like a loss.
[+] [-] bee_rider|3 years ago|reply
[+] [-] actually_a_dog|3 years ago|reply
[+] [-] mayli|3 years ago|reply
[+] [-] MuffinFlavored|3 years ago|reply
Can you expand on this please?
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] zaarn|3 years ago|reply
[+] [-] alex_suzuki|3 years ago|reply
[+] [-] petesergeant|3 years ago|reply
https://store.planetcom.co.uk/products/gemini-pda-1
I absolutely can't imagine what I'd use it for, and yet, my finger has hovered over "buy" many many times over the last few years
[+] [-] fy20|3 years ago|reply
[+] [-] Hamuko|3 years ago|reply
As for if that's a good use-case is a whole another thing.
[+] [-] ChuckMcM|3 years ago|reply
The big win here would be that all of the network wiring is "built in" and compact. Blade replacement it trivial.
Have your fans blow up from the bottom and stagger "slots" on each row and if you do 32 slots per row, you probably build a kilocore cluster in a 6U box.
Ah the fun I would have with a lab with an nice budget.
[+] [-] zokier|3 years ago|reply
That stuck out to me too, they are making custom boards and custom chassis, surely it would be cleaner to route the networking and power through backplane instead of having gazillion tiny patch cables and random switch just hanging in there. Could also avoid the need for PoE by just having power buses in the backplane.
Overall imho the point of blades is that some stuff gets offloaded to the chassis, but here the chassis doesn't seem to be doing much at all.
[+] [-] themoonisachees|3 years ago|reply
[+] [-] sitkack|3 years ago|reply
What do you have in mind? I couldn't find this part. Really am asking.
[+] [-] nine_k|3 years ago|reply
[+] [-] thejosh|3 years ago|reply
I really like Graviton from AWS, and Apple Silicon is great, I really hope we move towards ARM64 more. ArchLinux has https://archlinuxarm.org , I would love to use these to build and test arm64 packages (without needing to use qemu hackery, awesome though that it is).
[+] [-] Aissen|3 years ago|reply
* 4 Ampere Altra Max processors (in 2 or 4 servers), so about 512 cores, and much faster than anything those Raspberry Pi have.
* lots of RAM, probably about 4TB ?
* ~92TB of flash storage (or more ?)
Edit : I didn't want to disparage the compute blade, it looks like a very fun project. It's not even the same use case as the server hardware (and probably the best solution if you need actual raspberry pis), the only common thread is the 2U and rack use.
[+] [-] davgoldin|3 years ago|reply
More efficient use of space compared to my current silent mini-home lab -- also about 2U worth of space, but stacked semi-vertically [1].
That's 4 servers each with AMD 5950x, 128GB ECC, 2TB NVMe, 2x8TB SSD (64c/512GB/72TB total).
[1] https://ibb.co/Jm1SX7d
[+] [-] blitzar|3 years ago|reply
[+] [-] wildekek|3 years ago|reply
[+] [-] aseipp|3 years ago|reply
At this rate I have so little hope in other vendors that we'll probably just have to wait for the RPi5.
[+] [-] walrus01|3 years ago|reply
There is still near zero availability in mass market for CPUs you can stick into motherboards from one of the top ten taiwanese vendors of serious server class motherboards.
And don't even get me started on the lack of ability to actually buy raspberry pi of your desired configuration at a reasonable price and in stock to hit add to cart.
[+] [-] vegardx|3 years ago|reply
[+] [-] onphonenow|3 years ago|reply
[+] [-] Havoc|3 years ago|reply
[+] [-] eismcc|3 years ago|reply
https://www.theverge.com/2014/6/4/5779468/twitter-engineer-b...
[+] [-] lars-b2018|3 years ago|reply
[+] [-] exabrial|3 years ago|reply
I'm noticing that our JVM workloads execute _significantly_ faster on ARM. Just looking at the execution times on our lowly first-gen M1s Macbooks is significantly better than some of our best Intel or AMD hardware we have racked. I'm guessing it all has to do with Memory bandwidth.
[+] [-] sroussey|3 years ago|reply
Will need to deal with NUMA issues on the software side.
[+] [-] geerlingguy|3 years ago|reply
[+] [-] nubinetwork|3 years ago|reply
They're not going to get maximum performance from a nvme disk, the cpus are too slow, and gigabit isn't going to cut it for high throughput applications.
Until manufacturers start shipping boards with ~32 cores clocked faster than 2ghz and multiple 10gbit connections, they're nothing more than a fun nerd toy.
[+] [-] robbiet480|3 years ago|reply
[+] [-] pnathan|3 years ago|reply
I would, however, say that while I'm in the general target audience, I won't do crowdfunded hardware. If it isn't actually being produced, I won't buy it. The road between prototype and production is a long one for hardware.
(Still waiting for a very cool bit of hardware, 3+ years later - suspecting that project is just *dead*)