top | item 47132809

(no title)

An LLM uses constant compute per output token (one forward pass through the model), so the only computational mechanism to increase 'thinking' quantity is to emit more tokens. Hence why reasoning models produce many intermediary tokens that are not shown to the user, as mentioned in other replies here. This is also why the accuracy of "reasoning traces" is hotly debated; the words themselves may not matter so much as simply providing a compute scratch space.

Alternative approaches like "reasoning in the latent space" are active research areas, but have not yet found major success.

discuss

No comments yet.