top | item 44085843

(no title)

thasso | 9 months ago

For archive formats, or anything that has a table of contents or an index, consider putting the index at the end of the file so that you can append to it without moving a lot of data around. This also allows for easy concatenation.

discuss

order

zzo38computer|9 months ago

What probably allows for even more easier concatenation would be to store the header of each file immediately preceding the data of that file. You can make a index in memory when reading the file if that is helpful for your use.

HelloNurse|9 months ago

This would require a separate seek and read operation per archive member, each yielding only one directory entry, rather than very few read operation to load the whole directory at once.

charcircuit|9 months ago

Why not put it at the beginning so that it is available at the start of the filestream that way it is easier to get first so you know what other ranges of the file you may need?

>This also allows for easy concatenation.

How would it be easier than putting it at the front?

shakna|9 months ago

Files are... Flat streams. Sort of.

So if you rewrite an index at the head of the file, you may end up having to rewrite everything that comes afterwards, to push it further down in the file, if it overflows any padding offset. Which makes appending an extremely slow operation.

Whereas seeking to end, and then rewinding, is not nearly as costly.

lifthrasiir|9 months ago

If the archive is being updated in place, turning ABC# into ABCD#' (where # and #' are indices) is easier than turning #ABC into #'ABCD. The actual position of indices doesn't matter much if the stream is seekable. I don't think the concatenation is a good argument though.

MattPalmer1086|9 months ago

Imagine you have a 12Gb zip file, and you want to add one more file to it. Very easy and quick if the index is at the end, very slow if it's at the start (assuming your index now needs more space than is available currently).

Reading the index from the end of the file is also quick; where you read next depends on what you are trying to find in it, which may not be the start.

McGlockenshire|9 months ago

> How would it be easier than putting it at the front?

Have you ever wondered why `tar` is the Tape Archive? Tape. Magnetic recording tape. You stream data to it, and rewinding is Hard, so you put the list of files you just dealt with at the very end. This now-obsolete hardware expectation touches us decades later.