I didn't, hence the "first". It's clear that being good at next token prediction forces the models to learn a lot, including giving such answers. But it's not their loss function. Presumably they would be capable of lying and insulting you with the right system prompt just as well. And I doubt RLHF gets rid of this ability.
nearbuy|8 days ago
You could have just acknowledged they are roughly correct about RLHF, but brought up issues caused by pretraining.
> And I doubt RLHF gets rid of this ability.
The commenter you were replying to is worried the RLHF causes lying.