top | item 37299265

(no title)

thrashh | 2 years ago

I mean you can design a filesystem to handle a million files extremely quickly... it just has to be in the requirements up front.

But there will be some trade-off.

And I don't think people generally put "a million files" in the requirements because it's fairly rare.

discuss

saltcured|2 years ago

Not related to git (I hope), but a lot of scientific data/imaging folks seem to think file abstractions are free. I've seen more than one stack explode a _single_ microscope image into 100k files, so you'd hit 1M after trying to store just 10 microscope slides. Then, a realistic archive with thousands of images can hit a billion files before you know it.

It's hard to get people past the demo phase "works for me" when they have played with one image, to realize they really need a reasonable container format to play nice with the systems world outside their one task.

bityard|2 years ago

I was referring to general-purpose filesystems in common use today. Yes, there are a lot of special-purpose and experimental filesystems which are optimized for certain use cases, and a competent systems programmer could write one optimized specifically for small files, but these all have to make significant trade-offs.

didgetmaster|2 years ago

It used to be much more rare in the past. With 20 TB drives available today, it is much more common to be able to handle many more files. When I designed my file system replacement (www.Didgets.com), I didn't just put 'a million files' in the requirement; I put 100x more in it.

Now I have a system that will find subsets in just a second or two (even when the whole set contains hundreds of millions and any given subset might contain hundreds of thousands of matches). Here is a short video of a demo: https://www.youtube.com/watch?v=dWIo6sia_hw