(no title)
18al | 1 day ago
For instance the current high score model (311 params [0]), when given 12345678900 + 1, responds with 96913456789.
An interesting experiment would be: what's the minimum number of parameters required to handle unbounded addition (without offloading it to tool calls).
Of course memory constraints would preclude such an experiment. And so a sensible proxy would be: what kind of neural-net architecture and training would allow a model to handle numbers lengths it hasn't been trained on. I suspect, this may be not be possible.
blackbear_|15 hours ago
A recurrent neural network implementing binary addition with carry could do this, and one can derive the correct weights with pen and paper without too much effort.
Whether gradient descent will find them too is another matter entirely
wizzwizz4|18 hours ago