If being probabilistic prevented learning deterministic functions, transformers couldn’t learn addition either. But they can, so that can't be the reason.
Are you sure? I bet you if you pull 10 people off the street and ask them to multiply 5 digit by 5 digit numbers by hand, you won't have a 100% success rate.
wat10000|4 months ago
ddingus|4 months ago
When I multiply, I take it in chunks.
Put the LLM into a loop, instruct it to keep track of where it is and have it solve a digit at a time.
I bet it does just fine. See my other comment as to why I think that is.
krackers|4 months ago