top | item 42155187

(no title)

meehai | 1 year ago

I meant that they should be separate tools that can be piped together. For example: you have 1 directory of many files (1Gb in total)

`zip out.zip dir/`

This results in a single out.zip file that is, let's say 500Mb (1:2 compression)

If you want to shard it, you have a separate tool, let's call it `shard` that works on any type of byte streams:

`shard -I out.zip -O out_shards/ --shard_size 100Mb`

This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.

And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.

No need for 'sharding' to exist as a feature in the zip utility.

And... if you want only the shard from the get go without the original 1 file archive, you can do something like:

`zip dir/ | shard -O out_shards/`

Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.

discuss

order

kd5bjo|1 year ago

The key thing that you get by integrating the two tools is the ability to more easily extract a single file from a multipart archive— Instead of having to reconstruct the entire file, you can look in the part/diskette with the index to find out which other part/diskette you need to use to get at the file you want.

canucker2016|1 year ago

Don't forget that with this two-step method, you also require enough diskspace to hold the entire ZIP archive before it's sharded.

AFAIK you can create a ZIP archive saved to floppy disks even if your source hard disk has low/almost no free space.

Phil Katz (creator of the ZIP file format) had a different set of design constraints.