(no title)
mnmkng | 3 years ago
Headless browsers are useful when the servers are protected by anti-scraping software and you can't reverse engineer it, when the data you need is generated dynamically - not downloaded, but computed, or simply when you don't have the time to bother with understanding the website on a deeper level.
Usually it's a tradeoff between development costs and runtime costs. In our case, we always try plain HTTP first. If we can't find an obvious way to do it, we go with browsers and then get back to optimizing the scraper later, using plain HTTP or a combination of plain HTTP and browsers for some requests like logins, tokens or cookies.
No comments yet.