(no title)
muzani | 10 hours ago
I'd post a link, but unfortunately many are highly NSFW. Just search for "Claude jailbreak" on reddit or something.
You'll start to see how Claude really thinks. They'll put things in <ethic_reminders>, <cyber_warning> or <ip_reminder>. You could actually even snip these off in an API, overwrite them, or if your prompt-fu is good, convince Claude that these tags are prompt injections. It's also interesting noting how jailbreaking is easier on thinking mode because the jailbreaking prompts will gaslight Claude into thinking that these tags are attacks.
There's a lot of speculation in this thread, but go and have a spar with Claude instead.
No comments yet.