Respect to your work on ZeroFS, but I find it kind of off-putting for you to come in and immediately put down JuiceFS, especially with benchmark results that don't make a ton of sense, and are likely making apples-to-oranges comparisons with how JuiceFS works or mount options.
For example, it doesn't really make sense that "92% of data modification operations" would fail on JuiceFS, which makes me question a lot of the methodology in these tests.
I have very limited experiences with object storage, but my humble benchmarks with juicefs + minio/garage [1] showed very bad performance (i.e. total collapse within a few hours) when running lots of small operations (torrents).
I wouldn't be surprised if there's a lot of tuning that can be achieved, but after days of reading docs and experimenting with different settings i just assumed JuiceFS was a very bad fit for archives shared through Bittorrent. I hope to be proven wrong, but in the meantime i'm very glad zerofs was mentioned as an alternative for small files/operations. I'll try to find the time to benchmark it too.
> but I find it kind of off-putting for you to come in and immediately put down JuiceFS, especially with benchmark results that don't make a ton of sense, and are likely making apples-to-oranges comparisons with how JuiceFS works or mount options.
The benchmark suite is trivial and opensource [1].
Is performing benchmarks “putting down” these days?
If you believe that the benchmarks are unfair to juicefs for a reason or for another, please put up a PR with a better methodology or corrected numbers. I’d happily merge it.
EDIT: From your profile, it seems like you are running a VC backed competitor, would be fair to mention that…
> ZeroFS supports running multiple instances on the same storage backend: one read-write instance and multiple read-only instances.
Well that's a big limiting factor that needs to be at the front in any distributed filesystem comparison.
Though I'm confused, the page says things like "ZeroFS makes S3 behave like a regular block device", but in that case how do read-only instances mount it without constantly getting their state corrupted out from under them? Is that implicitly talking about the NBD access, and the other access modes have logic to handle that?
Edit: What I want to see is a ZeroFS versus s3backer comparison.
Let's remember that JuiceFS can be setup very easily to not have a single point of failure (by replicating the metadata engine), meanwhile ZeroFS seems to have exactly that.
Yea, that is a big caveat to ZeroFS. Single point of failure. It is like saying I can write a faster etcd by only having a single node. Sure, that is possible, but the hard part of distributed systems is the coordination, and coordination always makes performance worse.
I personally have went with Ceph for distributed storage. I personally have a lot more confidence in Ceph over JuiceFS and ZeroFS, but realize building and running a ceph cluster is more complex, but with that complexity you get much cheaper S3, block storage, and cephfs.
I'm not an expert in any way, but i personally benchmarked [1] juiceFS performance totalling collapsing under very small files/operations (torrenting). It's good to be skeptical, but it might just be that the bar is very low for this specific usecase (IIRC juiceFS was configured and optimized for block sizes of several MBs).
For a proper comparison, also significant to note that JuiceFS is Apache-2.0 licensed while ZeroFS is dual AGPL-3.0/commercial licensed, significantly limiting the latter's ability to be easily adopted outside of open source projects.
It’s not great UX on that angle. I am currently working on coordination (through s3, not node to node communication), so that you can just spawn instances without thinking about it.
ZeroFS is a single-writer architecture and therefore has overall bandwidth limited by the box it's running on.
JuiceFS scales out horizontally as each individual client writes/reads directly to/from S3, as long as the metadata engine keeps up it has essentially unlimited bandwidth across many compute nodes.
But as the benchmark shows, it is fiddly especially for workloads with many small files and is pretty wasteful in terms of S3 operations, which for the largest workloads has meaningful cost.
I think both have their place at the moment. But the space of "advanced S3-backed filesystems" is... advancing these days.
huntaub|1 month ago
For example, it doesn't really make sense that "92% of data modification operations" would fail on JuiceFS, which makes me question a lot of the methodology in these tests.
selfhoster1312|1 month ago
I wouldn't be surprised if there's a lot of tuning that can be achieved, but after days of reading docs and experimenting with different settings i just assumed JuiceFS was a very bad fit for archives shared through Bittorrent. I hope to be proven wrong, but in the meantime i'm very glad zerofs was mentioned as an alternative for small files/operations. I'll try to find the time to benchmark it too.
[1] https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1021
Eikon|1 month ago
The benchmark suite is trivial and opensource [1].
Is performing benchmarks “putting down” these days?
If you believe that the benchmarks are unfair to juicefs for a reason or for another, please put up a PR with a better methodology or corrected numbers. I’d happily merge it.
EDIT: From your profile, it seems like you are running a VC backed competitor, would be fair to mention that…
[1] https://github.com/Barre/ZeroFS/tree/main/bench
eYrKEC2|1 month ago
Dylan16807|1 month ago
Well that's a big limiting factor that needs to be at the front in any distributed filesystem comparison.
Though I'm confused, the page says things like "ZeroFS makes S3 behave like a regular block device", but in that case how do read-only instances mount it without constantly getting their state corrupted out from under them? Is that implicitly talking about the NBD access, and the other access modes have logic to handle that?
Edit: What I want to see is a ZeroFS versus s3backer comparison.
Edit 2: changed the question at the end
ChocolateGod|1 month ago
If I was a company I know which one I'd prefer.
__turbobrew__|1 month ago
I personally have went with Ceph for distributed storage. I personally have a lot more confidence in Ceph over JuiceFS and ZeroFS, but realize building and running a ceph cluster is more complex, but with that complexity you get much cheaper S3, block storage, and cephfs.
dpacmittal|1 month ago
selfhoster1312|1 month ago
https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1021
wgjordan|1 month ago
anonymousDan|1 month ago
unknown|1 month ago
[deleted]
maxmcd|1 month ago
Eikon|1 month ago
corv|1 month ago
aeblyve|1 month ago
JuiceFS scales out horizontally as each individual client writes/reads directly to/from S3, as long as the metadata engine keeps up it has essentially unlimited bandwidth across many compute nodes.
But as the benchmark shows, it is fiddly especially for workloads with many small files and is pretty wasteful in terms of S3 operations, which for the largest workloads has meaningful cost.
I think both have their place at the moment. But the space of "advanced S3-backed filesystems" is... advancing these days.
unknown|1 month ago
[deleted]
unknown|1 month ago
[deleted]
victorbjorklund|1 month ago