“You’re not [X]—you’re [Y]” is the one that drives me nuts. [X] is typically some negative characterization that, without RLHF, the model would likely just state directly. I get enough politics/subtext from humans. I’d rather the LLM just call it straight.
No comments yet.