top | item 44589582

(no title)

KTibow | 7 months ago

It sucks more that Cloudflare/similar have responded to this with "if your handshake fingerprints more like curl than like Chrome/Firefox, no access for you".

discuss

order

NoMoreNicksLeft|7 months ago

I now write all of my bots in javascript and run them from the Chrome console with CORS turned off. It seems to defeat even Google's anti-bot stuff. Of course, I need to restart Chrome every few hours because of memory leaks, but it wasn't a fun 3 days the last time I got banned from their ecosystem with my kids asking why they couldn't watch Youtube.

tomrod|7 months ago

Where can I learn more about custom bots in JS and Chrome?

edoceo|7 months ago

Or getting a CAPTCHA from Chrome when visiting a site you've been to dozens of times (Stack Overflow). Now I just skip that content, probably in my LLM already anyway.

codingminds|7 months ago

Keep in mind that those LLMs are one of the bigger reasons why we see more and more anti bot behaviour on sites like SO.

That aggressive crawling to train those on everything is insane.

realusername|7 months ago

It's the same thing as the anti pirate ads, you only annoy legit customers, this agressive captcha campaign just makes Stackoverflow drop down even faster than it would normally by making it lower quality.

EPendragon|7 months ago

There are tools like curl-impersonate: https://github.com/lwthiker/curl-impersonate out there that allow you to pretend to be any browser you like. Might take a bit of trial and error, but this mechanism could be bypassed with some persistence in identifying what is it that the resource is trying to block.