(no title)
trhway | 3 days ago
The question puts horse behind the buggy. The main point isn't "from", it is how you get to “predict the next word.” During the training the LLM builds inside itself compressed aggregated representation - a model - of what is fed into it. Giving the model you can "predict the next word" as well as you can do a lot of other things.
For simple starting point for understanding i'd suggest to look back at the key foundational stone that started it all - "sentiment neuron"
https://openai.com/index/unsupervised-sentiment-neuron/
"simply predicting the next character in Amazon reviews resulted in discovering the concept of sentiment.
...
Digging in, we realized there actually existed a single “sentiment neuron” that’s highly predictive of the sentiment value."
No comments yet.