top | item 43508808

(no title)

gradascent | 11 months ago

Then why do I never get an “I don’t know” type response when I use Claude, even when the model clearly has no idea what it’s talking about? I wish it did sometimes.

discuss

hun3|11 months ago

Quoting a paragraph from OP (https://www.anthropic.com/research/tracing-thoughts-language...):

> Sometimes, this sort of “misfire” of the “known answer” circuit happens naturally, without us intervening, resulting in a hallucination. In our paper, we show that such misfires can occur when Claude recognizes a name but doesn't know anything else about that person. In cases like this, the “known entity” feature might still activate, and then suppress the default "don't know" feature—in this case incorrectly. Once the model has decided that it needs to answer the question, it proceeds to confabulate: to generate a plausible—but unfortunately untrue—response.

trash_cat|11 months ago

Fun fact, "confabulation", not "hallucinating" is the correct term what LLMs actually do.