top | item 47211613

(no title)

muzani | 10 hours ago

In the spirit of Hacker News, a good way to learn about these tags is prompt injection and jailbreaking Claude.

I'd post a link, but unfortunately many are highly NSFW. Just search for "Claude jailbreak" on reddit or something.

You'll start to see how Claude really thinks. They'll put things in <ethic_reminders>, <cyber_warning> or <ip_reminder>. You could actually even snip these off in an API, overwrite them, or if your prompt-fu is good, convince Claude that these tags are prompt injections. It's also interesting noting how jailbreaking is easier on thinking mode because the jailbreaking prompts will gaslight Claude into thinking that these tags are attacks.

There's a lot of speculation in this thread, but go and have a spar with Claude instead.

discuss

No comments yet.