top | item 42155000

(no title)

meehai | 1 year ago

couldn't agree more!

We need to separate and design modules as unitary as possible:

- zip should ARCHIVE/COMPRESS, i.e. reduce the file size and create a single file from the file system point of view. The complexity should go in the compression algorithm.

- Sharding/sending multiple coherent pieces of the same file (zip or not) is a different module and should be handled by specialized and agnostic protocols that do this like the ones you mentioned.

People are always doing tools that handle 2 or more use cases instead of following the UNIX principle to create generic and good single respectability tools that can be combined together (thus allowing a 'whitelist' of combinations which is safe). Quite frankly it's annoying and very often leads to issues such as this that weren't even thought in the original design because of the exponential problem of combining tools together.

discuss

order

TeMPOraL|1 year ago

Well, 1) is zip with compression into single file, 2) is zip without compression into multiple files. You can also combine the two. And in all cases, you need a container format.

The tasks are related enough that I don't really see the problem here.

meehai|1 year ago

I meant that they should be separate tools that can be piped together. For example: you have 1 directory of many files (1Gb in total)

`zip out.zip dir/`

This results in a single out.zip file that is, let's say 500Mb (1:2 compression)

If you want to shard it, you have a separate tool, let's call it `shard` that works on any type of byte streams:

`shard -I out.zip -O out_shards/ --shard_size 100Mb`

This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.

And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.

No need for 'sharding' to exist as a feature in the zip utility.

And... if you want only the shard from the get go without the original 1 file archive, you can do something like:

`zip dir/ | shard -O out_shards/`

Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.

rakoo|1 year ago

The problem seems to be that each individual split part is valid in itself. This means that the entire file, with the central directory at the end, can diverge from each entry. This is the original issue.

murderfs|1 year ago

Why do you believe that archiving and compressing belong in the same layer more than sharding does? The unixy tool isn't zip, it's tar | gzip.

edflsafoiewq|1 year ago

tar|gzip does not allow random access to files. You have to decompress the whole tarball (up to the file you want).

chrisweekly|1 year ago

I agree!

Also, I enjoyed your Freudian slip:

single respectability tools

->

single responsibility tools