top | item 42833791

(no title)

not sure why people are surprised, it's been known a long time that RLHF essentially lobotomizes LLMs by training them to give answers the base model wouldn't give. Deepseek is better because they didn't gimp their own model

discuss

No comments yet.