top | item 45136207 (no title) kitku | 5 months ago This reminds me of the Nepenthes tarpit [1], which is an endless source of ad-hoc generated garbled mess which links to itself over and over.Probably more effective at poisoning the dataset if one has the resources to run it.[1]: https://zadzmo.org/code/nepenthes/ discuss order hn newest fleebee|5 months ago I'm running Iocaine[1] which is essentially the same thing on my tiny $3/mo VPS and it's handling crawlers bombarding the honeypot with ~12 requests per second just fine. It's using about 30 MB of RAM.[1]: https://iocaine.madhouse-project.org/ treetalker|5 months ago Odorless, tasteless, and among the more deadly poisons known to crawlers! load replies (1) 8organicbits|5 months ago Do we know if LLM scrapers are running JavaScript on the pages? If they are, maybe it's worth offloading the Markov model to the client side.
fleebee|5 months ago I'm running Iocaine[1] which is essentially the same thing on my tiny $3/mo VPS and it's handling crawlers bombarding the honeypot with ~12 requests per second just fine. It's using about 30 MB of RAM.[1]: https://iocaine.madhouse-project.org/ treetalker|5 months ago Odorless, tasteless, and among the more deadly poisons known to crawlers! load replies (1)
treetalker|5 months ago Odorless, tasteless, and among the more deadly poisons known to crawlers! load replies (1)
8organicbits|5 months ago Do we know if LLM scrapers are running JavaScript on the pages? If they are, maybe it's worth offloading the Markov model to the client side.
fleebee|5 months ago
[1]: https://iocaine.madhouse-project.org/
treetalker|5 months ago
8organicbits|5 months ago