There is quite an excitement about how someone has hacked the language model to output what was supposed to be a non-public set of rules apparently. How do people know if this is indeed the secret set of rules, not the list that the model was scripted to return in response to a request (perhaps, a bit elaborate) for the list of rules?
simonw|2 years ago