top | item 45140424

(no title)

Vetch | 5 months ago

It's an artifact of post-training approach. Models like kimi k2 and gpt-oss do not utter such phrases and are quite happy to start sentences with "No" or something to the tune of "Wrong".

Diffusion also won't help the way you seem to think it will (that the outputs occur in a sequence is not relevant, what's relevant is the underlying computation class backing each token output, and there, diffusion as typically done does not improve on things. The argument is subtle but the key is that output dimension and iterations in diffusion do not scale arbitrarily large as a result of problem complexity).

discuss

order

No comments yet.