(no title)
krethh | 14 days ago
There's the "draw the rest of the owl" of this problem.
Until we figure out a robust theoretical framework for identifying prompt injections (not anywhere close to that, to my knowledge - as OP pointed out, all models are getting jailbroken all the time), human-in-the-loop will remain the only defense.
CuriouslyC|14 days ago
krethh|14 days ago