(no title)
golol
|
6 months ago
IMO many misrepresentations.
- pretraining to predict the next token imposes no bias against surprise, except that low probabilities are more likely to have a large relative error.
- using a temperature lower than 1 does impose a direct bias against surprise.
- Finetuning of various kinds (instruction, RLHF, safety) may increase or decrease surprise. But certainly the kind of things ained for in finetuning significantly harm the capability to tell jokes.
sigmoid10|6 months ago