top | item 42923196

(no title)

Congratulations on the strong reception of min-p. Very clever!

We may be talking about two orthogonal things here. And also to be clear, I don't care about theoretical guarantees either.

Now, min-p is solving for the inadequacies of standard sampling techniques. It is almost like a clever adaptive search which other sampling methods fail at (despite truncations like top-k/top-p).

However, one thing that I noticed in the min-p results was that lower temperatures were almost always better in the final performance (and quite expectedly the inverse for creating writing). This observation makes me think that the underlying model is generally fairly good at ranking the best tokens. What sampling allows us is a margin-for-error in cases where the model ranked a relevant next token not at the top, but slightly lower.

Therefore, my takeaway from min-p is that it solves for deficiencies of current samplers but its success is not in contradiction to the fact that logprobs are bad proxies for semantics. Sampling is the simplest form of search, and I agree with you that better sampling methods are a solid ingredient to extract information from logprobs.

discuss

No comments yet.