top | item 45032952

(no title)

Hi, thanks for sharing.

My main concern with these browser agents are how are they handling prompt injection. This blog post on Perplexity's Comet browser comes to mind: https://brave.com/blog/comet-prompt-injection/.

Also, today Anthropic announced Claude for Chrome (https://www.anthropic.com/news/claude-for-chrome) and from the discussion on that (https://news.ycombinator.com/item?id=45030760), folks quickly pointed out that the attack success rate was 11.2%, which still seems very high.

How do you plan to handle prompt injection?

discuss

antves|6 months ago

This is a very valid concern. Here are some of our initial considerations:

1. Security of these agentic system is a hard and important problem to solve. We're indexing heavily on it, but it's definitely still early days and there is still a lot to figure out.

2. We have a critic LLM that assesses among other things whether the website content is leading a non-aligned initiative. This is still subject to the LLM intelligence, but it's a first step.

3. Our agents run in isolated browser sessions and, as per all software engineering, each session should be granted minimum access. Nothing more than strictly needed.

4. These attacks are starting to resemble social engineering attacks. There may be opportunities to shift some of the preventative approaches to the LLM world.

Thanks for asking this, we should probably share a write-up on this subject!

creatonez|6 months ago

> 2. We have a critic LLM that assesses among other things whether the website content is leading a non-aligned initiative. This is still subject to the LLM intelligence, but it's a first step.

> [...]

> 4. These attacks are starting to resemble social engineering attacks. There may be opportunities to shift some of the preventative approaches to the LLM world.

With current tech, if you get to the point where these mitigations are the last line of defense, you've entered the zone of security theater. These browser agents simply cannot be trusted. The best assumption you can make is they will do a mixture of random actions and evil actions. Everything downstream of it must be hardened to withstand both random & evil actions, and I really think marketing material should be honest about this reality.