top | item 45699045

(no title)

skinner_ | 4 months ago

That's very cool, but it's not an apples to apples comparison. The reasoning model learned how to do long multiplication. (Either from the internet, or from generated examples of long multiplication that were used to sharpen its reasoning skills. In principle, it might have invented it on its own during RL, but no, I don't think so.)

In this paper, the task is to learn how to multiply, strictly from AxB=C examples, with 4-digit numbers. Their vanilla transformer can't learn it, but the one with (their variant of) chain-of-thought can. These are transformers that have never encountered written text, and are too small to understand any of it anyway.

discuss

No comments yet.