top | item 45011319

(no title)

that_lurker | 6 months ago

Why not just block the User Agent?

discuss

order

arewethereyeta|6 months ago

Because it's the single most falsifiable piece of information you would find on ANY "how to scrape for dummies" article out there. They all start with changing your UA.

ryantgtg|6 months ago

Sure, but the article is about a bot that expressly identifies itself in the user agent and its user agent name contains a sentence suggesting you block its ip if you don’t like it. Since it uses at least 74 ips, blocking its user agent seems like a fine idea.

aspenmayer|6 months ago

I think the UA is easily spoofed, whereas the AS and IP are less easily spoofed. You have everything you need already to spoof UA, while you will need resources to spoof your IP, whether it’s wall clock time to set it up, CPU time to insert another network hop, and/or peers or other third parties to route your traffic, and so on. The User Agent are variables that you can easily change, no real effort or expense or third parties required.

N_Lens|6 months ago

Bots often rotate the UA too, their entire goal is to get through and scrape as much content as possible, using any means possible.

lexicality|6 months ago

because you have to parse the http request to do that, while blocking the IP can be done at the firewall