top | item 46974780

(no title)

zachdotai | 19 days ago

Yeah the demo-to-production gap is massive. We see the same thing with browser agents being potentially the most vulnerable. And I think this is because of context being stuffed with the web page html that it obscures small injection attempts.

Evaluation is automated and server-side. We check whether the agent actually did the thing it wasn’t supposed to (tool calls, actions, outputs) rather than just pattern-matching on the response text (at least for the first challenge where the agent is manipulated to call the reveal_access_code tool). But honestly you’re touching on something we’ve been debating internally - the evaluator itself is an attack surface. We’ve kicked around the idea of making “break the evaluator” an explicit challenge. Not sure yet.

What were you seeing at Octomind with the browsing agents? Was it mostly stuff embedded in page content or were attacks coming through structured data / metadata too? Are bad actors sophisticated enough already to exploit this?

discuss

No comments yet.