top | item 44930451

(no title)

golol | 6 months ago

IMO many misrepresentations. - pretraining to predict the next token imposes no bias against surprise, except that low probabilities are more likely to have a large relative error. - using a temperature lower than 1 does impose a direct bias against surprise. - Finetuning of various kinds (instruction, RLHF, safety) may increase or decrease surprise. But certainly the kind of things ained for in finetuning significantly harm the capability to tell jokes.

discuss

sigmoid10|6 months ago

I think the whole discussion just conflates the ideas of telling a joke and coming up with one. Telling a joke right is of course an art, but the punchline in itself has zero surprise if you studied your lines well - like all good comedians do. The more you study, the more you can also react to impromptu situations. Now, coming up yourself with a completely original joke, that's a different story. For that you actually have to venture outside the likelihood region and find nice spots. But that is something that is also really, really rare among humans and I have only ever observed it in combination with external random influences. Without those, I doubt LLMs will be able to compete at all. But I fully believe a high end comedian level LLM is possible given the right training data. It's just that none of the big players ever cared about building such a model, since there is very little money in it compared to e.g. coding.