top | item 45936320

(no title)

s0meON3 | 3 months ago

What about using zip bombs?

https://idiallo.com/blog/zipbomb-protection

discuss

order

lavela|3 months ago

"Gzip only provides a compression ratio of a little over 1000: If I want a file that expands to 100 GB, I’ve got to serve a 100 MB asset. Worse, when I tried it, the bots just shrugged it off, with some even coming back for more."

https://maurycyz.com/misc/the_cost_of_trash/#:~:text=throw%2...

LunaSea|3 months ago

You could try different compression methods supported by browsers like brotli.

Otherwise you can also chain compression methods like: "Content-Encoding: gzip gzip".

jeroenhd|3 months ago

Modern browsers support brotli or zstd, which is a lot better in terms of compression. Perhaps not as good for on-the-fly compression, but static assets can get a nice compression benefit out of it.

With toxic AI scrapers like Perplexity moving more and more to headless web browsers to bypass bot blocks, I think a brotli bomb (100GB of \0 can be compressed to about 78KiB with Brotli) would be quite effective.

kalkin|3 months ago

Ah cool that site's robots.txt is still broken, just like it was when it first came up on HN...

renegat0x0|3 months ago

Even I, who does not know much, implemented a workaround.

I have a web crawler and I have both scraping byte limit and timeout, so zip bombs dont bother me much.

https://github.com/rumca-js/crawler-buddy

I think garbage blabber would be more effective.