top | item 46637828

(no title)

Eikon | 1 month ago

ZeroFS [0] outperforms JuiceFS on common small file workloads [1] while only requiring S3 and no 3rd party database.

[0] https://github.com/Barre/ZeroFS

[1] https://www.zerofs.net/zerofs-vs-juicefs

discuss

order

huntaub|1 month ago

Respect to your work on ZeroFS, but I find it kind of off-putting for you to come in and immediately put down JuiceFS, especially with benchmark results that don't make a ton of sense, and are likely making apples-to-oranges comparisons with how JuiceFS works or mount options.

For example, it doesn't really make sense that "92% of data modification operations" would fail on JuiceFS, which makes me question a lot of the methodology in these tests.

selfhoster1312|1 month ago

I have very limited experiences with object storage, but my humble benchmarks with juicefs + minio/garage [1] showed very bad performance (i.e. total collapse within a few hours) when running lots of small operations (torrents).

I wouldn't be surprised if there's a lot of tuning that can be achieved, but after days of reading docs and experimenting with different settings i just assumed JuiceFS was a very bad fit for archives shared through Bittorrent. I hope to be proven wrong, but in the meantime i'm very glad zerofs was mentioned as an alternative for small files/operations. I'll try to find the time to benchmark it too.

[1] https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1021

Eikon|1 month ago

> but I find it kind of off-putting for you to come in and immediately put down JuiceFS, especially with benchmark results that don't make a ton of sense, and are likely making apples-to-oranges comparisons with how JuiceFS works or mount options.

The benchmark suite is trivial and opensource [1].

Is performing benchmarks “putting down” these days?

If you believe that the benchmarks are unfair to juicefs for a reason or for another, please put up a PR with a better methodology or corrected numbers. I’d happily merge it.

EDIT: From your profile, it seems like you are running a VC backed competitor, would be fair to mention that…

[1] https://github.com/Barre/ZeroFS/tree/main/bench

eYrKEC2|1 month ago

I'm always curious about the of the option space. I appreciate folks talking about the alternative s. What's yours?

Dylan16807|1 month ago

> ZeroFS supports running multiple instances on the same storage backend: one read-write instance and multiple read-only instances.

Well that's a big limiting factor that needs to be at the front in any distributed filesystem comparison.

Though I'm confused, the page says things like "ZeroFS makes S3 behave like a regular block device", but in that case how do read-only instances mount it without constantly getting their state corrupted out from under them? Is that implicitly talking about the NBD access, and the other access modes have logic to handle that?

Edit: What I want to see is a ZeroFS versus s3backer comparison.

Edit 2: changed the question at the end

ChocolateGod|1 month ago

Let's remember that JuiceFS can be setup very easily to not have a single point of failure (by replicating the metadata engine), meanwhile ZeroFS seems to have exactly that.

If I was a company I know which one I'd prefer.

__turbobrew__|1 month ago

Yea, that is a big caveat to ZeroFS. Single point of failure. It is like saying I can write a faster etcd by only having a single node. Sure, that is possible, but the hard part of distributed systems is the coordination, and coordination always makes performance worse.

I personally have went with Ceph for distributed storage. I personally have a lot more confidence in Ceph over JuiceFS and ZeroFS, but realize building and running a ceph cluster is more complex, but with that complexity you get much cheaper S3, block storage, and cephfs.

dpacmittal|1 month ago

The magnitude of performance difference alone immediately makes me skeptical of your benchmarking methodology.

selfhoster1312|1 month ago

I'm not an expert in any way, but i personally benchmarked [1] juiceFS performance totalling collapsing under very small files/operations (torrenting). It's good to be skeptical, but it might just be that the bar is very low for this specific usecase (IIRC juiceFS was configured and optimized for block sizes of several MBs).

https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1021

wgjordan|1 month ago

For a proper comparison, also significant to note that JuiceFS is Apache-2.0 licensed while ZeroFS is dual AGPL-3.0/commercial licensed, significantly limiting the latter's ability to be easily adopted outside of open source projects.

anonymousDan|1 month ago

Why would this matter if you're just using the database?

maxmcd|1 month ago

does having to maintain the slatedb as a consistent singleton (even with write fencing) make this as operationally tricky as a third party db?

Eikon|1 month ago

It’s not great UX on that angle. I am currently working on coordination (through s3, not node to node communication), so that you can just spawn instances without thinking about it.

corv|1 month ago

Looks like the underdog beats it handily and easier deployment to boot. What's the catch?

aeblyve|1 month ago

ZeroFS is a single-writer architecture and therefore has overall bandwidth limited by the box it's running on.

JuiceFS scales out horizontally as each individual client writes/reads directly to/from S3, as long as the metadata engine keeps up it has essentially unlimited bandwidth across many compute nodes.

But as the benchmark shows, it is fiddly especially for workloads with many small files and is pretty wasteful in terms of S3 operations, which for the largest workloads has meaningful cost.

I think both have their place at the moment. But the space of "advanced S3-backed filesystems" is... advancing these days.