top | item 35926544

(no title)

omnster | 2 years ago

There is quite an excitement about how someone has hacked the language model to output what was supposed to be a non-public set of rules apparently. How do people know if this is indeed the secret set of rules, not the list that the model was scripted to return in response to a request (perhaps, a bit elaborate) for the list of rules?

discuss

simonw|2 years ago

We don't know for sure - but we have seen this same situation play out many times for many other systems. It's far more likely that this attack worked than that this particulate team have solved a problem that has defeated basically everyone else. https://news.ycombinator.com/item?id=35925239