top | item 42815905

(no title)

fbouvier | 1 year ago

I fully understand your concern and agree that scrapers shouldn't be hurting web servers.

I don't think they are using our browser :)

But in my opinion, blocking a browser as such is not the right solution. In this case, it's the user who should be blocked, not the browser.

discuss

order

jjcoffman|1 year ago

If your browser doesn't play nicely and obey robots.txt when its headless I don't think it's that crazy to block the browser and not the user.

fbouvier|1 year ago

Every tool can be used in a good or bad way, Chrome, Firefox, cURL, etc. It's not the browser who doesn't play nicely, it's the user.

It's the user's responsibility to behave well, like in life :)

hansvm|1 year ago

The first thing that came to mind when I saw this project wasn't scraping (where I'd typically either want a less detectible browser or a more performant option), but as a browser engine that's actually sane to link against if I wanted to, e.g., write a modern TUI browser.

Banning the root library (even if you could with UA spoofing and whatnot) is right up there with banning Chrome to keep out low-wage scraping centers and their armies of employees. It's not even a little effective also risks significant collateral damage.

slt2021|1 year ago

it is trivial to spoof user-agent, if you want to stop a motivated scraper, you need a different solution that exploits the fact that robots use headless browser