(no title)
karamanolev | 3 days ago
What does that even mean? Their weights are essentially created by training. There aren't some magic golden weights that are then distorted.
karamanolev | 3 days ago
What does that even mean? Their weights are essentially created by training. There aren't some magic golden weights that are then distorted.
grey-area|3 days ago
1. Weights in the model are created by ingesting the corpus
2. Techniques like reinforcement learning, alignment etc are used to adjust those weights before model release
3. The model is used and more context injected which then affects which words it will choose, though it is still heavily biased by the corpus and training.
That could be way off base though, I'd welcome correction on that.
The point I was trying to make though was that they do more than predict next word based on just one set of data. Their weights can encode entire passages of source material in the training data (https://arxiv.org/abs/2505.12546), including books, programs. This is why they are so effective at generating code snippets.
Also text injected at the last stage during use has far less weight than most people assume (e.g. https://georggrab.net/content/opus46retrieval.html) and is not read and understood IMO.
There are a lot of inputs nowadays and a lot of stages to training. So while I don't think they are intelligent I think it is reductive to call them next token predictors or similar. Not sure what the best name for them is, but they are neither next word predictors nor intelligent agents.
karamanolev|3 days ago
You're right that the weights can enable the model to memorize training data.
joquarky|3 days ago
grey-area|2 days ago
Are you saying it is a separate process which scrubs output before we see it?