top | item 44219717

(no title)

MrZander | 8 months ago

This doesn't follow with my understanding of transformers at all. I'm not aware of any human labeling in the training.

What would labeling even do for an LLM? (Not including multimodal)

The whole point of attention is that it uses existing text to determine when tokens are related to other tokens, no?

discuss

daveguy|8 months ago

The transformers are accurately described in the article. The confusion comes in the Reinforcement Learning Human Feedback (RLHF) process after a transformer based system is trained. These are algorithms on top of the basic model that make additional discriminations of the next word (or phrase) to follow based on human feedback. It's really just a layer that makes these models sound "better" to humans. And it's a great way to muddy the hype response and make humans get warm fuzzies about the response of the LLM.

MrZander|8 months ago

Oh, interesting, TIL. Didn't realize there was a second step to training these models.