top | item 45964136

(no title)

frameset | 3 months ago

It actually is.

I run a small video game forum with posts going back to 2008. We got absolutely smashed by bots scraping for training data for LLMs.

So I put it behind Cloudflare and now it's down. Ho hum.

discuss

order

watermelon0|3 months ago

Have you tried Anubis or similar tools? I've had similar issues with bot scraping of a forum taking all server resources, and using PoW challenge solved the problem.

https://github.com/TecharoHQ/anubis

gspr|3 months ago

I've always wondered: has there been any effort to implement a PoW challenge like that at a lower level? I.e., TCP but the handshake requires solving a challenge, otherwise the connection is just closed? It seems like something that could benefit from being invisible on the application layer.

Edit: To answer my own question, yes: http://www.arijuels.com/wp-content/uploads/2013/09/JB99.pdf

Edit 2: Maybe TLS would be another reasonable place for it?

frameset|3 months ago

I did! It's very cool tech. However for our config it was easier to slap CF in front of it.

I will say one very appealing use of Anubis I'd love to try is using it as a Traefik middleware to protect services running in docker containers.

stevepotter|3 months ago

Can you please elaborate on “smashed”? I’m very interested

frameset|3 months ago

I took a screenshot of the graph in cloudflare when I switched on the bot challenges.

https://i.ibb.co/qHCJyY7/image.png

I wrote the below to explain to our users what was happening, so apologies if the language is too simple for a HN reader.

- 0630, we switched our DNS to proxy through CF, starting the collection of data, and implemented basic bot protections

- Unfortunately whatever anti-bot magic they have isn't quite having the effect, even after two hours.

- 0830, I sign in and take a look at the analytics. It seems like <SITE NAME> is very popular in Vietnam, Brazil, and Indonesia.

- 0845, I make it so users from those countries have to pass a CF "challenge". This is similar to a CAPTCHA, but CF try to make it so there's no "choosing all the cars in an image" if they can help it.

- So far 0% of our Asian audience have passed a challenge.

trollbridge|3 months ago

Same problem here. If I didn't use Cloudflare, nearly all of my traffic would be (apparently misconfigured) scraper bots.

shaky-carrousel|3 months ago

It'd funny if these bots were run by Cloudflare.

frameset|3 months ago

Ha, yeah. They seemed to mostly be in SE Asia.