top | item 45066494

(no title)

Operyl | 6 months ago

They're getting to the point of 200-300RPS for some of my smaller marketing sites, hallucinating URLs like crazy. It's fucking insane.

discuss

order

palmfacehn|6 months ago

You'd think they would have an interest in developing reasonable crawling infrastructure, like Google, Bing or Yandex. Instead they go all in on hosts with no metering. All of the search majors reduce their crawl rate as request times increase.

On one hand these companies announce themselves as sophisticated, futuristic and highly-valued, on the other hand we see rampant incompetence, to the point that webmasters everywhere are debating the best course of action.

esperent|6 months ago

I suspect it's because they're dealing with such unbelievable levels of bandwidth and compute for training and inference that the amount required to blast the entire web like this barely registers to them.

matt-p|6 months ago

Honestly it's just tragedy of the commons. Why put the effort in when you don't have to identify yourself, just crawl and if you get blocked move the job to another server.

whatevaa|6 months ago

They vibe code their crawlers.

matt-p|6 months ago

I'm seeing around the same, as a fairly constant base load. Even more annoying when it's hitting auth middleware constantly, over and over again somehow expecting a different answer.