top | item 47191664

(no title)

18al | 1 day ago

Depends on how the transformer has been trained. If it has seen 11 digit examples while training it might work, else the input will be out of distribution and it will respond with a nonsensical number.

For instance the current high score model (311 params [0]), when given 12345678900 + 1, responds with 96913456789.

An interesting experiment would be: what's the minimum number of parameters required to handle unbounded addition (without offloading it to tool calls).

Of course memory constraints would preclude such an experiment. And so a sensible proxy would be: what kind of neural-net architecture and training would allow a model to handle numbers lengths it hasn't been trained on. I suspect, this may be not be possible.

[0] https://github.com/rezabyt/digit-addition-311p

discuss

blackbear_|15 hours ago

> what kind of neural-net architecture and training would allow a model to handle numbers lengths it hasn't been trained on

A recurrent neural network implementing binary addition with carry could do this, and one can derive the correct weights with pen and paper without too much effort.

Whether gradient descent will find them too is another matter entirely

wizzwizz4|18 hours ago

If the neural network had moveable tape heads which could seek between invocations, and the inputs were provided in little-endian format, a fairly small model could implement arbitrary addition with carry, and you'd only need to add a few redundant dimensions to get something that could be trained.