top | item 45290720

(no title)

ttfvjktesd | 5 months ago

How does TernFS compare to CephFS and why not CephFS, since it is also tested for the multiple Petabyte range?

discuss

order

rostayob|5 months ago

(Disclaimer: I'm one of the authors of TernFS and while we evaluated Ceph I am not intimately familiar with it)

Main factors:

* Ceph stores both metadata and file contents using the same object store (RADOS). TernFS uses a specialized database for metadata which takes advantage of various properties of our datasets (immutable files, few moves between directories, etc.).

* While Ceph is capable of storing PBs, we currently store ~600PBs on a single TernFS deployment. Last time we checked this would be an order of magnitude more than even very large Ceph deployments.

* More generally, we wanted a system that we knew we could easily adapt to our needs and more importantly quickly fix when something went wrong, and we estimated that building out something new rather than adapting Ceph (or some other open source solution) would be less costly overall.

mgrandl|5 months ago

There are definitely insanely large Ceph deployments. I have seen hundreds of PBs in production myself. Also your usecase sounds like something that should be quite manageable for Ceph to handle due to limited metadata activity, which tends to be the main painpoint with CephFS.

eps|5 months ago

Last point is an extremely important advantage that is often overlooked and denigrated. But having a complex system that you know inside-out because you made it from scratch pays in gold in the long term.

_jsmh|5 months ago

Any compression at the filesystem level?

jleahy|5 months ago

The seamless realtime intercontinental replication is a key feature for us, maybe the most important single feature, and AFAIK you can’t do that with Ceph (even if Ceph could scale to our original 10 exabyte target in one instance).

cmdrk|5 months ago

CephFS implements a (fully?) POSIX filesystem while it seems that TernFS makes tradeoffs by losing permissions and mutability for further scale.

Their docs mention they have a custom kernel module, which I suppose is (today) shipped out of tree. Ceph is in-tree and also has a FUSE implementation.

The docs mention that TernFS also has its own S3 gateway, while RADOSGW is fully separate from CephFS.

jcul|5 months ago

My (limited) understanding is that cephfs, RGW (S3), RBD (block device) are all different things using the same underlying RADOS storage.

You can't mount and access RGW S3 objects as cephfs or anything, they are completely separate (not counting things like goofys, s3fs etc.), even if both are on the same rados cluster.

Not sure if TernFS differs there, would be kind of nice to have the option of both kinds of access to the same data.

KaiserPro|5 months ago

Ceph isn't that well suited for high performance. its also young and more complex than you'd want it to be (ie you get a block storage system, which you then have to put a FS layer on after.)

if you want performance, then you'll probably want lustre, or GPFS, or if you're rich a massive isilon system.