top | item 47108728

(no title)

aix1 | 8 days ago

Yeah, how exactly would that work?

discuss

order

CuriouslyC|8 days ago

A schema with response metadata (so responses that deviate from it fail automatically), plus a challenge question that's calibrated to be hard enough that the disruption of instruction following from prompt injection can cause the model to answer incorrectly.