top | item 47101253

(no title)

rybosworld | 9 days ago

I see that your prompt includes 'Do not use any tools. If you do, write "I USED A TOOL"'

This is not a valid experiment, because GPT models always have access to certain tools and will use them even if you tell them not to. They will fib the chain of thought after the fact to make it look like they didn't use a tool.

https://www.anthropic.com/research/alignment-faking

It's also well established that all the frontier models use python for math problems, not just GPT family of models.

discuss

order

simianwords|9 days ago

Would it convince you if we use the GPT Pro api and explicitly not allow tool access?

Is that enough to falsify?

rybosworld|9 days ago

No, it wouldn't be enough to falsify.

This isn't an experiment a consumer of the models can actually run. If you have a chance to read the article I linked, it is difficult even for the model maintainers (openai, anthropic, etc.) to look into the model and see what it actually used in it's reasoning process. The models will purposefully hide information about how they reasoned. And they will ignore instructions without telling you.

The problem really isn't that LLM's can't get math/arithmetic right sometimes. They certainly can. The problem is that there's a very high probability that they will get the math wrong. Python or similar tools was the answer to the inconsistency.

chickenimprint|9 days ago

As far as I know, you can't disable the python interpreter. It's part of the reasoning mode.

If you ask ChatGPT, it will confirm that it uses the python interpreter to do arithmetic on large numbers. To you, that should be convincing.

jibal|9 days ago

It's not falsifiable because it's not false.