top | item 46331883

(no title)

542458 | 2 months ago

Okay, so I know back in the day you could choke scanning software (ie email attachment scanners) by throwing a zip bomb into them. I believe the software has gotten smarter these days so it won’t simply crash when that happens - but how is this done; How does one detect a zip bomb?

discuss

danudey|2 months ago

I don't understand the code itself, but here's Debian's patch to detect overlapping zip bombs in `unzip`:

https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...

    The detection maintains a list of covered spans of the zip files
    so far, where the central directory to the end of the file and any
    bytes preceding the first entry at zip file offset zero are
    considered covered initially. Then as each entry is decompressed
    or tested, it is considered covered. When a new entry is about to
    be processed, its initial offset is checked to see if it is
    contained by a covered span. If so, the zip file is rejected as
    invalid.

So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.

necovek|2 months ago

I wonder if this has actually been used for backing up in real use cases (think how LVM or ZFS do snapshotting)?

I.e. an advanced compressor could abuse the zip file format to share base data for files which only incrementally change (get appended to, for instance).

And then this patch would disallow such practice.

10000truths|2 months ago

For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:

1. A exceeds some unreasonable threshold

2. A/B exceeds some unreasonable threshold

integralid|2 months ago

In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.

nrhrjrjrjtntbt|2 months ago

Embarrsingly simple for a scanner too as you just mark as suspicious when this happens. You can be wrong sometimes and this is expected