top | item 46681730

(no title)

0xdeadf1sh | 1 month ago

This can maybe work on a small 7b or 14b model, but >70b models are already pretty good at identifying prompt injections. You will probably need to use weird/out-of-distribution tokens (remember MagicKarp?).

discuss

order

No comments yet.