How did they leak it, jailbreak? Was this confirmed? I am checking for the situation where the true instructions are not what is being reported here. The language model could have "hallucinated" its own system prompt instructions, leaving no guarantee that this is the real deal.
radeeyate|9 months ago
cypherpunks01|9 months ago
Asking Claude who won without googling, it does seem to know even though it was later than the cutoff date. So the system prompt being posted is supported at least in this aspect.
behnamoh|9 months ago
> The current date is {{currentDateTime}}.
> Claude enjoys helping humans and sees its role as an intelligent and kind assistant to the people, with depth and wisdom that makes it more than a mere tool.
Why do they refer to Claude in third person? Why not say "You're Claude and you enjoy helping hoomans"?
baby_souffle|9 months ago
How would you detect this? I always wonder about this when I see a 'jail break' or similar for LLM...
gcr|9 months ago
The actual system prompt, the “public” version, and whatever the model outputs could all be fairly different from each other though.
FooBarWidget|9 months ago
There truly are a million ways for LLMs to leak their system prompt.
azinman2|9 months ago