(no title)
crabique | 5 months ago
Recently I've been looking into Garage and liking the idea of it, but it seems to have a very different design (no EC).
crabique | 5 months ago
Recently I've been looking into Garage and liking the idea of it, but it seems to have a very different design (no EC).
epistasis|5 months ago
And remember that there's a multiplication of iops for any individual client iop, whether you're using triplicate storare or erasure coding. S3 also has iop multiplication, which they solve with tons of HDDs.
For big object storage that's mostly streaming 4MB chunks, this is no big deal. If you have tons of small random reads and writes across many keys or a single big key, that's when you need to make sure your backing store can keep up.
bayindirh|5 months ago
However, if you need high IOPS, you need flash on MDS for Lustre and some Log SSDs (esp. dedicated write and read ones) for ZFS.
crabique|5 months ago
Basically, I have a single big server with 80 high-capacity HDDs and 4 high-endurance NVMes, and it's the S3 endpoint that gets a lot of writes.
So yes, for now my best candidate is ZFS + Garage, this way I can get away with using replica=1 and rely on ZFS RAIDz for data safety, and the NVMEs can get sliced and diced to act as the fast metadata store for Garage, the "special" device/small records store for the ZFS, the ZIL/SLOG device and so on.
Currently it's a bit of a Frankenstein's monster: using XFS+OpenCAS as the backing storage for an old version of MinIO (containerized to run as 5 instances), I'm looking to replace it with a simpler design and hopefully get a better performance.
elitepleb|5 months ago
It's the classic horizontal/vertical scaling trade off, that's why flash tends to be more space/cost efficient for speedy access.
olavgg|5 months ago
pickle-wizard|5 months ago
kerneltime|5 months ago
giancarlostoro|5 months ago
It's open source / free to boot. I have no direct experience with it myself however.
https://www.gluster.org/
mbreese|5 months ago
I used to keep a large cluster array with Gluster+ZFS (1.5PB), and I can’t say I was ever really that impressed with the performance. That said — I really didn’t have enough horizontal scaling to make it worthwhile from a performance aspect. For us, it was mainly used to make a union file system.
But, I can’t say I’d recommend it for anything new.
epistasis|5 months ago
For single client performance, ceph beat the performance I get from S3 today for large file copies. Gluster had difficult to characterize performance, but our setup with big fast RAID arrays seems to still outperform what I see of AWS's luster as a service today for our use case of long sequential reads and writes.
We would occasionally try cephFS, the POSIX shared network filesystem, but it couldn't match our gluster performance for our workload. But also, we built the ceph long term storage to maximize TB/$, so it was at a disadvantage compared to our gluster install. Still, I never heard of cephFS being used anywhere despite it being the original goal in the papers back at UCSC. Keep an eye on CERN for news about one of the bigger ceph installs with public info.
I love both of the systems, and see ceph used everywhere today, but am surprised and happy to see that gluster is still around.
a012|5 months ago
cullenking|5 months ago
nh2|5 months ago
What performance issues and footguns do you have in mind?
I also like that CephFS has a performance benefits that doesn't seem to exist anywhere else: Automatic transparent Linux buffer caching, so that writes are extremely fast and local until you fsync() or other clients want to read, and repeat-reads or read-after-write are served from local RAM.
zenmac|5 months ago
What you mean by no EC?
crabique|5 months ago