top | item 43898685

(no title)

dewarrn1 | 10 months ago

So, in reference to the "reasoning" models that the article references, is it possible that the increased error rate of those models vs. non-reasoning models is simply a function of the reasoning process introducing more tokens into context, and that because each such token may itself introduce wrong information, the risk of error is compounded? Or rather, generating more tokens with a fixed error rate must, on average, necessarily produce more errors?

discuss

ActorNightly|10 months ago

Its a symptom of asking the models to provide answers that are not exactly in the training set, so the internal interpolation that the models do probably runs into edge cases where statistically it goes down the wrong path.

mountainriver|10 months ago

This is exactly it, it’s the result of RLVR, where we force the model to reason about how to get to an answer when that information isn’t in its base training.