top | item 41999163

(no title)

kmckiern | 1 year ago

https://cdn.openai.com/o1-system-card-20240917.pdf

Check out the "CoT Deception Monitoring" section. In 0.38% of cases, o1's CoT shows that it knows it's providing incorrect information.

Going beyond hallucinations, models can actually be intentionally deceptive.

discuss

polotics|1 year ago

Please detail what you mean by "intentionally" here, because obviously, this is the ultimate alignment question...

...so after having a read through your reference, the money-shot:

Intentional hallucinations primarily happen when o1-preview is asked to provide references to articles, websites, books, or similar sources that it cannot easily verify without access to internet search, causing o1-preview to make up plausible examples instead.