lwthiker
|
3 years ago
|
on: Ask HN: What are the best tools for web scraping in 2022?
curl-impersonate[1] is a curl fork that I maintain and which lets you fetch sites while impersonating a browser. Unfortunately, the practice of TLS and HTTP fingerprinting of web clients has become extremely common in the past ~1 year, which means a regular curl request will often return some JS challenge and not the real content. curl-impersonate helps with that.
[1] https://github.com/lwthiker/curl-impersonate
lwthiker
|
3 years ago
|
on: Firefox appears to be flagged as suspicious by Cloudflare
For a while now I can't even access WhatsApp Web [1] through Firefox. It gets stuck on the loading page and keeps refreshing itself forever. I have to resort to Chrome whenever I need WhatsApp on my laptop.
[1] https://web.whatsapp.com
lwthiker
|
3 years ago
|
on: Firefox appears to be flagged as suspicious by Cloudflare
lwthiker
|
4 years ago
|
on: Show HN: Curl modified to impersonate Firefox and mimic its TLS handshake
They don't block you completely, just present you with a JS challenge that delays your access to the site. A browser, even if behind a MITM proxy, would be able to solve this challenge.
lwthiker
|
4 years ago
|
on: Show HN: Curl modified to impersonate Firefox and mimic its TLS handshake
I hope to do so in the future, for now the implementation is extremely hacky so I doubt it can get accepted into curl.
lwthiker
|
4 years ago
|
on: Show HN: Curl modified to impersonate Firefox and mimic its TLS handshake
I will try to impersonate Chrome next, However, I suspect this is going to be more challenging. Chrome uses BoringSSL, which curl does not support. So it means either enforcing curl to compile with BoringSSL or modifying NSS to look like BoringSSL.
lwthiker
|
4 years ago
|
on: Show HN: Curl modified to impersonate Firefox and mimic its TLS handshake
Thanks for the suggestion, I had no idea ESR was a thing. I've just added support for Firefox ESR 91 (it was pretty similar and required adding one cipher to the cipher list and changing the user agent).
[1] https://github.com/lwthiker/curl-impersonate