top | item 43174648

(no title)

jackienotchan | 1 year ago

AI agents have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN.

Do you have any built-in features that address these issues?

discuss

order

MagMueller|1 year ago

Yes, some hosting services have experienced a 100%-1000% increase in hosting costs.

On most platforms, browser use only requires the interactive elements, which we extract, and does not need images or videos. We have not yet implemented this optimization, but it will reduce costs for both parties.

Our goal is to abstract backend functionality from webpages. We could cache this, and only update the cache if eTags change.

Websites that really don't want us will come up with audio captchas and new creative methods.

Agents are different from bots. Agents are intended as a direct user clone and could also bring revenue to websites.

erellsworth|1 year ago

>Websites that really don't want us will come up with audio captchas and new creative methods.

Which you or other AIs will then figure a way around. You literally mention "extract data behind login walls" as one of your use cases so it sounds like you just don't give a shit about the websites you are impacting.

It's like saying, "If you really don't want me to break into your house and rifle through your stuff, you should just buy a more expensive security system."

deadfece|1 year ago

In my experience these web agents are relatively expensive to run and are very slow. Admittedly I don’t browse HN frequently but I’d be interested to read some of these agent abuse stories, if any stand out to you. (I’ve been googling for ai agent website abuse stories and not finding anything so far)