top | item 44350489

(no title)

samcat116 | 8 months ago

> I can’t see a way around that except to have some kind of sandboxing or a concept of untrusted or tainted input rather than treating all tokens as the same. Maybe a way of detecting if the response of a tool is within a threshold of acceptability within the definition of the MCP (which is easier with structured output), which is used to force a manual confirmation or straight up rejection if it’s deemed to be unusual or unsafe.

I think we are starting to see these remote agent environments where each agent session gets its own sandbox environment to run things in. I bet thats where this is going.

discuss

order