top | item 45136207

(no title)

kitku | 5 months ago

This reminds me of the Nepenthes tarpit [1], which is an endless source of ad-hoc generated garbled mess which links to itself over and over.

Probably more effective at poisoning the dataset if one has the resources to run it.

[1]: https://zadzmo.org/code/nepenthes/

discuss

order

fleebee|5 months ago

I'm running Iocaine[1] which is essentially the same thing on my tiny $3/mo VPS and it's handling crawlers bombarding the honeypot with ~12 requests per second just fine. It's using about 30 MB of RAM.

[1]: https://iocaine.madhouse-project.org/

treetalker|5 months ago

Odorless, tasteless, and among the more deadly poisons known to crawlers!

8organicbits|5 months ago

Do we know if LLM scrapers are running JavaScript on the pages? If they are, maybe it's worth offloading the Markov model to the client side.