top | item 35945040

(no title)

Giving different permissions levels to different email senders would be very challenging to implement reliably with LLMs. With an AI assistant like this, the typical implementation would be to feed it the current instruction, history of interactions, content of recent emails, etc, and ask it what command to run to best achieve the most recent instruction. You could try to ask the LLM to say which email the command originates from, but if there's a prompt injection, the LLM can be tricked in to lying about that. Any permissions details need to be implemented outside the LLM, but that pretty much means that each email would need to be handled in its own isolated LLM instance, which means that it's impossible to implement features like summarizing all recent emails.

discuss

williamcotton|2 years ago

You don’t need to ask the LLM where the email came from or provide the LLM with the email address. You just take the subject and the body of the email and provide that to the LLM, and then take the response from the LLM along with the unaffected email address to make the API calls…

  addTodoItem(taintedLLMtranslation, untaintedOriginalEmailAddress)

As for summaries, don’t allow that output to make API calls or be eval’d! Sure, it might be in pig latin from a prompt injection but it won’t be executing arbitrary code or even making API calls to delete Todo items.

All of the data that came from remote commands, such as the body of a newly created Todo item, should still be considered tainted and and treated in a similar manner.

These are the exact same security issues for any case of remote API calls with arbitrary execution.

alangpierce|2 years ago

Agreed that if you focus on any specific task, there's a safe way to do it, but the challenge is to handle arbitrary natural language requests from the user. That's what the Privileged LLM in the article is for: given a user prompt and only the trusted snippets of conversation history, figure out what action should be taken and how the Quarantined LLM should be used to power the inputs to that action. I think you really need that kind of two-layer approach for the general use case of an AI assistant.