top | item 44101432

(no title)

marcfisc | 9 months ago

Sadly, these ideas have been explored before, e.g.: https://simonwillison.net/2022/Sep/17/prompt-injection-more-...

Also, OpenAI has proposed ways of training LLMs to trust tool outputs less than User instructions (https://arxiv.org/pdf/2404.13208). That also doesn't work against these attacks.

discuss

No comments yet.