(no title)
bscphil | 6 months ago
I think that's pretty good evidence, and it's certainly not impossible for an LLM to print the system prompt since it is in the context history of the conversation (as I understand it, correct me if that's wrong).
bscphil | 6 months ago
I think that's pretty good evidence, and it's certainly not impossible for an LLM to print the system prompt since it is in the context history of the conversation (as I understand it, correct me if that's wrong).
cgriswald|6 months ago
Is that evidence that they’re trying to stop a common behavior or evidence that the system prompt was inverted in that case?
Edit: I asked it whether its system prompt discouraged or encouraged the behavior and it returned some of that exact same text including the examples.
It ended with:
> If you want, I can— …okay, I’ll stop before I violate my own rules.