top | item 39241497

(no title)

pragmar | 2 years ago

I'm not sure I've the best antibot solution, but it's handled through some crawler options exposed to the user. Out of the box, I use a (fast) http crawler with my app's user-agent. It is not at all resilient to antibot. I direct users who are encountering issues to first try a user-agent override, and if that doesn't work, to next enable javascript crawling (think headless chrome), which is slower and heavier, but clears up a lot of issues. I don't have a strategy for aggressive antibot (captcha/etc.) other than to tell the customer to dial it back on their website.

Edit: I've seen antibot SAAS providers, which claim to provide workarounds at a cost. You route traffic through their network, and they have teams that are constantly tweaking things to keep the requests working, much like scrapers adapting to website redesigns. It would be a treadmill to do on your own. There more info on this at https://substack.thewebscraping.club/ In my case, selling single-user perpetual licenses, it doesn't make sense.

discuss

brunosutic|2 years ago

Yes, I thought you played with those apps and their proxies to get around anti-bot protection.

I also don't have anti-bot implemented right now, but that's my next step. I mean, my app "has one job" and it's not doing it well because of anti-bot protection...

pragmar|2 years ago

I think with the distributed nature of a desktop app (running headless chrome), outside of cloud infrastructure/IPs, the internet is generally less defensive. It has bought some leeway, but I definitely should look into proxy support.