(no title)
kstrauser | 1 day ago
And for that matter, what’s it do with 9 digit numbers? Like, is it more accurate with them, or are these little guys mainly good at adding numbers with exactly 10 digits?
Basically, are the failures modes a gentle increase in inaccuracy, or spectacle failure outside their parameters?
SlinkyOnStairs|1 day ago
The set of 11-digit numbers with any given failure mode (or even successful output) has no discernable pattern, merely whatever randomness the training process baked into the model.
You can't predict ahead of time when they will fail spectacularly, nor draw a clear boundary around the failure cases. And early major example of this were the "glitch tokens" introduced into most LLMs by training on reddit data.
But there is an "in general"/"average failure rate across all inputs of a given size" answer: LLMs performance drops off a cliff once the input reaches too much complexity. (A "┐" shaped curve) In contrast to humans, where you can ask a child to add two N-digit numbers and the error rate will be approximately linear to N.
varispeed|1 day ago
18al|1 day ago
For instance the current high score model (311 params [0]), when given 12345678900 + 1, responds with 96913456789.
An interesting experiment would be: what's the minimum number of parameters required to handle unbounded addition (without offloading it to tool calls).
Of course memory constraints would preclude such an experiment. And so a sensible proxy would be: what kind of neural-net architecture and training would allow a model to handle numbers lengths it hasn't been trained on. I suspect, this may be not be possible.
[0] https://github.com/rezabyt/digit-addition-311p
blackbear_|11 hours ago
A recurrent neural network implementing binary addition with carry could do this, and one can derive the correct weights with pen and paper without too much effort.
Whether gradient descent will find them too is another matter entirely
wizzwizz4|14 hours ago