top | item 35593439

(no title)

anyekwest | 2 years ago

What if we just train LLMs to remove prompt injections from inputs? I feel like this isn't an intractable problem.

discuss

order

te_chris|2 years ago

The author addressed this: why would the model built on the hallucinating technique be able to police the main hallucinator

kolinko|2 years ago

He didn't really.

TisButMe|2 years ago

(author here) How do you know what's a prompt injection vs actual content? If you train another LLM to tell you what's a prompt injection, how do you know it has 100% coverage of all possible injections? OpenAI has been battling people trying to bypass their prompt re-write filter, and as far as I can see, not really winning, just constantly adding stuff to their blocklist until the next thing gets discovered.