(no title)
nyc_pizzadev | 2 years ago
Let’s say I have a directory tree with 100MM files in a nested structure, where the average file is 4+ directories deep. When I `ls` the top few directories, is it fast? How long until I discover updates?
Reading the docs, it looks like it’s using this API for traversal [0]?
What about metadata like creation times, permission, owner, group?
Any consistency concerns?
[0] https://cloud.google.com/storage/docs/json_api/v1/objects/li...
BrandonY|2 years ago
We've got some additional documentation on the differences and limitations between Gcsfuse and a proper POSIX filesystem: https://cloud.google.com/storage/docs/gcs-fuse#expandable-1
Gcsfuse is a great way to mount Cloud Storage buckets and view them like they're in a filesystem. It scales quite well for all sorts of uses. However, Cloud Storage itself is a flat namespace with no built-in directory support. Listing the few top level directories of a bucket with 100MM files more or less requires scanning over your entire list of objects, which means it's not going to be very fast. Listing objects in a leaf directory will be much faster, though.
nyc_pizzadev|2 years ago
Our theoretical usecase is 10+ PB and we need multiple TB/s of read throughout (maybe of fraction of that for writing). So I don’t think Filestore fits this scale, right?
As for the directory traversals, I guess caching might help here? Top level changes aren’t as frequent as leaf additions.
That being said, I don’t see any (caching) proxy support anywhere other than the Google CDN.
milesward|2 years ago
daviesliu|2 years ago
PS, I'm founder of JuiceFS.
[1] https://github.com/juicedata/juicefs
victor106|2 years ago
skrowl|2 years ago
[deleted]
kuchenbecker|2 years ago