(no title)
lloydatkinson | 6 days ago
Turns out all of the major AI slop companies had been hounding our wiki constantly for months, and this had resulted in Apache spawning hundreds of instances, bringing the whole machine to a halt.
Millions upon millions of requests, hundreds of GB's of bandwidth. Thankfully we're using Cloudflare so could block all of them except real search engine crawlers and now we don't have any problems at all. I also made sure to constrain Apache's limits a bit too.
From what I've read, forums, wikis, git repos are the primary targets of harassment by these companies for some reason. The worst part is these bots could just download a git repo or a wiki dump and do whatever it wants with it, but instead they are designed to push maximum load onto their victims.
Our wiki, in total, is a few gigabytes. They crawled it thousands of times over.
toast0|6 days ago
Ugh, such a weird design. At least my experience has been you are better off setting Apache to always run the same number of instances, and tuning that number as appropriate rather than having the instance count fluctuate under load.
lloydatkinson|6 days ago
mrweasel|6 days ago
lithos|6 days ago
Git content likely to have code for the bot to train on.