top | item 37407772

(no title)

BrandonY | 2 years ago

Hi, Brandon from GCS here. If you're looking for all of the guarantees of a real, POSIX filesystem, you want to do fast top level directory listing for 100MM+ nested files, and POSIX permissions/owner/group and other file metadata are important to you, Gcsfuse is probably not what you're after. You might want something more like Filestore: https://cloud.google.com/filestore

We've got some additional documentation on the differences and limitations between Gcsfuse and a proper POSIX filesystem: https://cloud.google.com/storage/docs/gcs-fuse#expandable-1

Gcsfuse is a great way to mount Cloud Storage buckets and view them like they're in a filesystem. It scales quite well for all sorts of uses. However, Cloud Storage itself is a flat namespace with no built-in directory support. Listing the few top level directories of a bucket with 100MM files more or less requires scanning over your entire list of objects, which means it's not going to be very fast. Listing objects in a leaf directory will be much faster, though.

discuss

order

nyc_pizzadev|2 years ago

Thanks for the reply.

Our theoretical usecase is 10+ PB and we need multiple TB/s of read throughout (maybe of fraction of that for writing). So I don’t think Filestore fits this scale, right?

As for the directory traversals, I guess caching might help here? Top level changes aren’t as frequent as leaf additions.

That being said, I don’t see any (caching) proxy support anywhere other than the Google CDN.

milesward|2 years ago

Brandon, I know why this was built, and I agree with your list of viable uses; that said, it strikes me as extremely likely to lead to gnarly support load, grumpy customers, and system instability when it is inevitably misused. What steps across all of the user interfaces is GCP taking to warn users who may not understand their workload characteristics at all as to the narrow utility of this feature?