top | item 44608904

(no title)

AlSweigart | 7 months ago

LLMs are really not good at following specific processes like math. They operate off vibes.

Ask Claude to multiply two ten-digit numbers. It gets the first one or two digits correct, and then makes up the rest.

ChatGPT used to have the same problem, but now it writes a program to perform the math for it.

discuss

order

yunyu|7 months ago

This was true up until they started training them using Reinforcement Learning from Verifier Feedback (started with O1). By sticking a calculator in the training loop, they seem to have gotten out of the arithmetic error regime. That said, the ChatGPT default is 4o which is still susceptible to these issues.