top | item 43013789

(no title)

wastholm | 1 year ago

As a tiny and very informal experiment in metacognition, I once told ChatGPT something along the lines of "I will now ask you a question. If you are not sure you know the correct answer, you will respond with only '418 I'm a teapot', nothing else." I then asked it for the correct identity of Jack the Ripper (the first thing I could think of that famously has a lot of theories but no agreed-upon "correct" answer).

The first time, as expected, it ignored my instructions and started hallucinating. But when I did the same thing again some months later, I was surprised when it actually answered only "418 I'm a teapot", indicating that it knew it didn't know the answer.

Just an anecdote. I'm sure there are people doing actual research in this area.

discuss

anon22981|1 year ago

I don’t think there’s anything surprising here and nothing to do with metacognition.

LLM should be able to answer ”I don’t know (for certain)” to questions where the training material also says ”this is not known and there are only speculations”. It’s the answer it’s training data gave.