top | item 45290815

(no title)

rostayob | 5 months ago

(Disclaimer: I'm one of the authors of TernFS and while we evaluated Ceph I am not intimately familiar with it)

Main factors:

* Ceph stores both metadata and file contents using the same object store (RADOS). TernFS uses a specialized database for metadata which takes advantage of various properties of our datasets (immutable files, few moves between directories, etc.).

* While Ceph is capable of storing PBs, we currently store ~600PBs on a single TernFS deployment. Last time we checked this would be an order of magnitude more than even very large Ceph deployments.

* More generally, we wanted a system that we knew we could easily adapt to our needs and more importantly quickly fix when something went wrong, and we estimated that building out something new rather than adapting Ceph (or some other open source solution) would be less costly overall.

discuss

order

mgrandl|5 months ago

There are definitely insanely large Ceph deployments. I have seen hundreds of PBs in production myself. Also your usecase sounds like something that should be quite manageable for Ceph to handle due to limited metadata activity, which tends to be the main painpoint with CephFS.

rostayob|5 months ago

I'm not fully up to date since we looked into this a few years ago, at the time the CERN deployments of Ceph were cited as particularly large examples and they topped out at ~30PB.

Also note that when I say "single deployment" I mean that the full storage capacity is not subdivided in any way (i.e. there are no "zones" or "realms" or similar concepts). We wanted this to be the case after experiencing situations where we had significant overhead due to having to rebalance different storage buckets (albeit with a different piece of software, not Ceph).

If there are EB-scale Ceph deployments I'd love to hear more about them.

kachapopopow|5 months ago

Ceph is more of: here's a raw block of data, do whatever the hell you want with it, not really good for immutable data.

eps|5 months ago

Last point is an extremely important advantage that is often overlooked and denigrated. But having a complex system that you know inside-out because you made it from scratch pays in gold in the long term.

_jsmh|5 months ago

Any compression at the filesystem level?

rostayob|5 months ago

No, we have our custom compressor as well but it's outside the filesystem.