(no title)
kmckiern | 1 year ago
Check out the "CoT Deception Monitoring" section. In 0.38% of cases, o1's CoT shows that it knows it's providing incorrect information.
Going beyond hallucinations, models can actually be intentionally deceptive.
kmckiern | 1 year ago
Check out the "CoT Deception Monitoring" section. In 0.38% of cases, o1's CoT shows that it knows it's providing incorrect information.
Going beyond hallucinations, models can actually be intentionally deceptive.
polotics|1 year ago
...so after having a read through your reference, the money-shot:
Intentional hallucinations primarily happen when o1-preview is asked to provide references to articles, websites, books, or similar sources that it cannot easily verify without access to internet search, causing o1-preview to make up plausible examples instead.