top | item 14833542

(no title)

wrath | 8 years ago

I can't provide details on any innovations we've done with sites like google, but in general if you want to crawl google you'll want to get "many, many" IP addresses. I've heard of people using services like 2captcha.com but the best way is to obfuscate who you are.

If you can hit Google 60 times per minute per IP before getting blocked and you need to crawl them 1000 times per minute, you need 17 IPs per hour. Randomize headers to look like real people coming from schools, office buildings, etc... Lots of work but possible.

discuss

order

No comments yet.