(no title)
jmakov
|
1 year ago
Would be interesting to know how files get stored. They don't mention any distributed FS solutions like SeaweedFS so once a drive is full, does the file get sent to another one via some service? Also ZFS seems an odd choice since deletions (esp of small files) at +80% full drive are crazy slow.
ryao|1 year ago
For something like fast mail that has many users, unlinking should be parallel already, so unlinking on ZFS will not be slow for them.
By the way, that 80% figure has not been true for more than a decade. You are referring to the best fit allocator being used to minimize external fragmentation under low space conditions. The new figure is 96%. It is controlled by metaslab_df_free_pct in metaslab.c:
https://github.com/openzfs/zfs/blob/zfs-2.2.0/module/zfs/met...
Modification operations become slow when you are at/above 96% space filled, but that is to prevent even worse problems from happening. Note that my friend’s pool was below the 96% threshold when he was suffering from a slow rm -r. He just had a directory subtree with a large amount of directory entries he wanted to remove.
For what it is worth, I am the ryao listed here and I was around when the 80% to 96% change was made:
https://github.com/openzfs/zfs/graphs/contributors
switch007|1 year ago
brongondwana|1 year ago
jmakov|1 year ago
shrubble|1 year ago
Deletion of files depends on how they have configured the message store - they may be storing a lot of data into a database, for example.
mastax|1 year ago
ackshi|1 year ago
I think that 80% figure is from when drives were much smaller and finding free space over that threshold with the first-fit allocator was harder.
brongondwana|1 year ago
For now, the "file storage" product is a Node tree in mysql, with content stored in a content-addressed blob store, which is some custom crap I wrote 15 years ago that is still going strong because it's so simple there's not much to go wrong.
We do plan to eventually move the blob storage into Cyrus as well though, because then we have a single replication and backup system rather than needing separate logic to maintain the blob store.