top | item 40139821

(no title)

markwkw | 1 year ago

You can easily demonstrate that an LLM does know certain fact X AND demonstrate that the LLM will deny that they know fact X (or be flaky about it, randomly denying and divulging the fact)

There are two explanations: A. They lack self-reflection B. They know they know fact X, but avoid acknowledging for ... reasons?

I find the argument for A quite compelling

discuss

astrange|1 year ago

> demonstrate that the LLM will deny that they know fact X (or be flaky about it, randomly denying and divulging the fact)

No, the sampling algorithm you used to query the LLM does that. Not the model itself.

e.g. https://arxiv.org/pdf/2306.03341.pdf

> B. They know they know fact X, but avoid acknowledging for ... reasons?

That reason being that the sampling algorithm didn't successfully sample the answer.

throwaway290|1 year ago

They will say "it's just a bad LLM", don't bother