(no title)
DavidSJ | 14 hours ago
[Edit: but to be clear, for a pretrained model this probability means "what's my estimate of the conditional probability of this token occurring in the pretraining dataset?", not "how likely is this statement to be true?" And for a post-trained model, the probability really has no simple interpretation other than "this is the probability that I will output this token in this situation".]
mr_toad|11 hours ago
Basically, you’d need a lot more computing power to come up with a distribution of the output of an LLM than to come up with a single answer.
podnami|14 hours ago
DavidSJ|14 hours ago
But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.