AI agents have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN.Do you have any built-in features that address these issues?
MagMueller|1 year ago
On most platforms, browser use only requires the interactive elements, which we extract, and does not need images or videos. We have not yet implemented this optimization, but it will reduce costs for both parties.
Our goal is to abstract backend functionality from webpages. We could cache this, and only update the cache if eTags change.
Websites that really don't want us will come up with audio captchas and new creative methods.
Agents are different from bots. Agents are intended as a direct user clone and could also bring revenue to websites.
erellsworth|1 year ago
Which you or other AIs will then figure a way around. You literally mention "extract data behind login walls" as one of your use cases so it sounds like you just don't give a shit about the websites you are impacting.
It's like saying, "If you really don't want me to break into your house and rifle through your stuff, you should just buy a more expensive security system."
deadfece|1 year ago
unknown|1 year ago
[deleted]