top | item 45550911

(no title)

bcoates | 4 months ago

Also the persuasion paper he links isn't at all about what he's talking about.

That paper is about using persuasion prompts to overcome trained in "safety" refusals, not to improve prompt conformance.

discuss

order

danshapiro|4 months ago

Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.

make3|4 months ago

isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising

diamond559|4 months ago

It's bc they are programmed to be agreeable and friendly so that you'll keep using them.