top | item 32566045

(no title)

thekeyper | 3 years ago

Hi! Very cool project. Just out of curiosity, what trips up Crawlee on CreepJS? I haven't heard of anyone actually using it in production (actually don't think it's meant for production use). It's certainly overzealous in its aggregate "trust score", but (a) it seems like a good benchmark to aim for; (b) some of its sub-scores, like "stealth" and "like headless", might be helpful for Crawlee to evaluate, given the signals included in those analyses are fairly simple for people to throw together in their own custom (production) bot detection scripts and are somewhat ubiquitous.

discuss

order

mnmkng|3 years ago

With fingerprints it's a tradeoff between having enough of them for large scale scraping and staying consistent with your environment. E.g. you can get exponentially more combinations if you also use Firefox, Webkit, MacOS and Windows user-agents (and prints) when you're running Chrome on Linux, but you also expose yourself to the better detection algorithms. If you stick to Linux Chrome only prints (which is what you usually run in VMs), you'll be less detectable, but might get rate limited.