I'm not going to argue these are good tests, if you asked a coworker these questions they'd look at you weird, but what surprised me is how well you can encode a sentence never written down before, put it through base64 encoding, and then ask an llm to decode it. And the good models can do this surprisingly well.
No comments yet.