top | item 37049388

(no title)

blister | 2 years ago

Hah, I literally just fought this for the past month. We run a large esports league that relies on player ranked data. They have the data, and as mentioned above, they send it down to the browser in beautiful JSON objects.

But they're sitting behind Cloudflare and aggressively blocking attempts to fetch data programmatically, which is a huge problem for us with 6000+ players worth of data to fetch multiple times every 3 months.

So... I built a Chrome Extension to grab the data at a speed that is usually under their detection rate. Basically created a distributed scraper and passed it out to as many people in the league as I could.

For big jobs when we want to do giant batches, it was a simple matter of doing the pulls and when we start getting 429 errors (rate limit blocking code they use), switch to a new IP on the VPN.

The only way they can block us now is if they stop having a website.

Give one of the commercial VPN providers a try. They're usually pretty cheap and have tons of IPs all over the place. Adding a "VPN Disconnect / Reconnect" step to the process only added about 10 seconds per request every so often.

discuss

order

kbenson|2 years ago

It probably doesn't save you much, since you already built the chrome extension, but having done both I found that tampermonkey is often much easier to deal with in most cases and also much quicker to develop for (you can literally edit the script in the tampermonkey extension settings page and reload the page you want it to apply to for immediate testing).

sublinear|2 years ago

I might be wrong, but some sites can block 'self' origin scripts by leaving it out of the Content Security Policy and only allowing scripts they control served by a CDN or specified subdomain to run on their page. Not sure when I last tried this and on what browser(s).

You'd have to disable CSP manually in your browser config to make it work, but that leaves you with an insecure browser and a lot of friction for casual users. Not sure if you can tie about:config options to a user profile for this use case. Distributing a working extension/script is getting harder all the time.

throwawayadvsec|2 years ago

a VPN won't do anything to help you with instagram

the best of the best are 4G rotating proxies

the fingerprint needs to change also