(no title)
chickenimprint | 9 days ago
This is what Mistral outputs:
The result of multiplying 63,157,997,633 by 63,114,90,009 is:
3,965,689,999,999,999,999,999 (approximately 3.966 × 10²⁴).
That's like 5 orders of magnitude off, the scientific notation doesn't even match the full integer, and the mantissa is also slightly wrong.
simianwords|9 days ago
GPT-5 pro without tools can easily solve your question and much harder ones.
Rather: does there exist a model that can perform these calculations reliably is a better way to falsify this claim.
Else we can always find the worst model with 1B parameters to falsify any claim made on LLM's.
AlexeyBrin|9 days ago
The only way to be sure a model calls no tool is to run it locally and control the network.
chickenimprint|9 days ago
If you try using ChatGPT, you'll see it goes into reasoning mode (It flashes the text "Working") and takes a while when asked to do arithmetic. Chatbots that don't forbid the user from viewing the reasoning chain of thought, such as Deepseek or Mistral, reveal that these chatbots work exactly as I described:
====
Thought for 1s
Thinking
Okay, the user wants to multiply two large numbers: 63,157,997,633 and 6,311,490,009. Since these numbers are quite large, I should use the code interpreter to handle this calculation accurately. Let's write a simple piece of code to compute this product.
Code
output
generated
python
# Calculating the product of the two numbers
number1 = 63157997633
number2 = 6311490009
product = number1 * number2
product
ResultCopy
398621071049125148697
Thinking
Great, the code interpreter has given us the product of the two numbers. Now, let's present the result to the user.
====
Falsifying my claim is simple: Find a model that verifiably does basic arithmetic correctly without relying on any external tools.
rybosworld|9 days ago
How are you able to use GPT-5 with tools turned off? Do you mean external tools (like searching the web)?
My understanding is that GPT models always have access to python, and it isn't something you can turn off.