Agreed that if you focus on any specific task, there's a safe way to do it, but the challenge is to handle arbitrary natural language requests from the user. That's what the Privileged LLM in the article is for: given a user prompt and only the trusted snippets of conversation history, figure out what action should be taken and how the Quarantined LLM should be used to power the inputs to that action. I think you really need that kind of two-layer approach for the general use case of an AI assistant.
williamcotton|2 years ago
Here’s an example of what I mean:
https://github.com/williamcotton/transynthetical-engine#brow...
By keeping the main discourse between the user and the LLM from containing all of the generated code and instead just using that main “thread” to orchestrate instructions to write code it allows for more back-and-forth.
It’s a good technique in general!
I’m still too paranoid to execute instructions via email without a very limited set of abilities!