No, this statement is not true for anything except a base model. Benchmaxxing during RL phase is how you get the advertisement style "punchy" writing, because even though people don't usually write that way it is eye catching and people will vote for the bullet-point emdash slop. I wonder if some lab will be bold enough to do "anti rlhf", lmarena score be damned.
No comments yet.