(no title)
andy12_ | 19 days ago
> On our verbalized evaluation awareness metric, which we take as an indicator of potential risks to the soundness of the evaluation, we saw improvement relative to Opus 4.5. However, this result is confounded by additional internal and external analysis suggesting that Claude Opus 4.6 is often able to distinguish evaluations from real-world deployment, even when this awareness is not verbalized.
[1] https://www-cdn.anthropic.com/14e4fb01875d2a69f646fa5e574dea...
gwd|17 days ago
That said, apparently Gemini's internal thought process reveals that it thinks loads of things were simulations when they aren't; it's 99% sure news stories about Trump from Dec 2025 are a detailed simulation:
https://www.reddit.com/r/GeminiAI/comments/1qhadce/gemini_is...
ETA: From the article that put me on this:
> I write nonfiction about recent events in AI in a newsletter. According to its CoT while editing, Gemini 3 disagrees about the whole "nonfiction" part:
>> It seems I must treat this as a purely fictional scenario with 2025 as the date. Given that, I'm now focused on editing the text for flow, clarity, and internal consistency.
https://www.lesswrong.com/posts/8uKQyjrAgCcWpfmcs/gemini-3-i...