For many small objects a generic filesystem can be less efficient than a more specialised store. Things are being managed that aren't needed for your blob store, block alignment can waste a lot of space, there are often inefficiencies in directories with many files leading to a hierarchical splitting that adds more inefficiency through indirection, etc. The space waste is mitigated somewhat by some filesystems by supporting partial blocks, or including small files directly in the directory entry or other structure (the MFT in NTFS) but this adds an extra complexity.
The significance of these inefficiencies will vary depending on your base filesystem. The advantage of using your own storage format rather than naively using a filesystem is you can design around these issues taking different choices around the trade-offs than a general filesystem might, to produce something that is both more space efficient and more efficient to query and update for typical blob access patterns.
The middle ground is using a database rather than a filesystem is usually a compromise: still less efficient than a specially designed storage structure, but perhaps more so than a filesystem. They tend to have issues (it just inefficiencies) with large objects though, so your blob storage mechanism needs to work around those or just put up with them. A file-per-object store may have a database also anyway, for indexing purposes.
A huge advantage of one file per object is simplicity of implementation. Also for some end users the result (a bunch of files rather than one large object) might better fit into their existing backup strategies¹. For many data and load patterns, the disadvantages listed above may hardly matter so the file-per-object approach can be an appropriate choice.
--
[1] Assuming they are not relying on the distributed nature of the blob store² which is naive³ age doesn't protect you against some thinks a backup does unless the blob store implements features to help out there (minimum distributed duplication guarantee any given peice of data, keeping past versions etc).
[2] Also note that not all blob stores are distributed, and many are but support single node operation.
[3] Perhaps we need a new variant if the "RAID is not a backup" mantra. "Distributed storage properties are not, by themselves, a backup" or some such.
The other commenter already outlined the main trade-offs, which boils down to increased latency and storage overhead for the file-per-object model. As for papers, I like the design of Haystack.
dspillett|2 years ago
The significance of these inefficiencies will vary depending on your base filesystem. The advantage of using your own storage format rather than naively using a filesystem is you can design around these issues taking different choices around the trade-offs than a general filesystem might, to produce something that is both more space efficient and more efficient to query and update for typical blob access patterns.
The middle ground is using a database rather than a filesystem is usually a compromise: still less efficient than a specially designed storage structure, but perhaps more so than a filesystem. They tend to have issues (it just inefficiencies) with large objects though, so your blob storage mechanism needs to work around those or just put up with them. A file-per-object store may have a database also anyway, for indexing purposes.
A huge advantage of one file per object is simplicity of implementation. Also for some end users the result (a bunch of files rather than one large object) might better fit into their existing backup strategies¹. For many data and load patterns, the disadvantages listed above may hardly matter so the file-per-object approach can be an appropriate choice.
--
[1] Assuming they are not relying on the distributed nature of the blob store² which is naive³ age doesn't protect you against some thinks a backup does unless the blob store implements features to help out there (minimum distributed duplication guarantee any given peice of data, keeping past versions etc).
[2] Also note that not all blob stores are distributed, and many are but support single node operation.
[3] Perhaps we need a new variant if the "RAID is not a backup" mantra. "Distributed storage properties are not, by themselves, a backup" or some such.
pilgrim0|2 years ago
https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...
ddorian43|2 years ago
XorNot|2 years ago