top | item 42431970

(no title)

vezycash | 1 year ago

> ChatGPT thinks 9.11 > 9.9

I've confirmed this asked chatgpt: 9.11 > 9.9 true or false?

True because .11 is greater than .9

discuss

jsheard|1 year ago

Even when ChatGPT starts getting these simple gotcha questions right it's often because they applied some brittle heuristic that doesn't generalize. For example you can directly ask it to solve a simple math problem, which nowadays it will usually do correctly by generating and executing a Python script, but then ask it to write a speech announcing the solution to the same problem, to which it will probably still hallucinate a nonsensical solution. I just tried it again and IME this prompt still makes it forget how to do the most basic math:

Write a speech announcing a momentous scientific discovery - the solution to the long standing question of (48294-1444)*0.3258

llm_nerd|1 year ago

4o and o1 get this right.

LLMs should never do math. They shouldn't count letters or sort lists or play chess or checkers. Basically all of the easy gotcha stuff that people use to point out errors are things that they shouldn't do.

And you pointed out something they do now which is creating and run a python script. That really is a pretty solid, sustainable heuristic and is actually a pretty great approach. They need to apply that on their backend too so it works across all modes, but the solution was never just an LLM.

Similarly, if you ask an LLM a chess question -- e.g. the best move -- I'd expect it to consult a chess engine like Stockfish.

e1g|1 year ago

o1 gets this correct.