top | item 45106323

(no title)

braden_e | 6 months ago

There is a very large scale crawler that uses random valid user agents and a staggeringly large pool of ips. I first noticed it because a lot of traffic was coming from Brazil and "HostRoyale" (asn 203020). They send only a few requests a day from each ip so rate limiting is not useful.

I run a honeypot that generates urls with the source IP so I am pretty confident it is all one bot, in the past 48 hours I have had over 200,000 ips hit the honeypot.

I am pretty sure this is Bytedance, they occasionally hit these tagged honeypot urls with their normal user agent and their usual .sg datacenter.

discuss

order

candlemas|6 months ago

My site has also recently been getting massively hit by Brazilian IPs. It lasts for a day or two, even if they are being blocked.

kjkjadksj|6 months ago

I wonder if you could implement a dummy rate limit? Half the time you are rate limited randomly. A real user will think nothing of it and refresh the page.

ronsor|6 months ago

That will irritate real users half the time while the bots won't care.

dizlexic|6 months ago

I've written my own bots that do exactly this. My reason was mainly to avoid detection so as part of that I also severely throttled my requests and hit the target at random intervals. In other words, I wasn't trying to abuse them. I just didn't want them to notice me.

TLDR it's trivial to send fake info when you're the one who controls the info.